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Redirection home 


Europe’s researchers should grab every opportunity to ensure that funds redirected towards 
strategic investment will not miss science altogether. 


04 February 2015 


The almost 19,000 followers of @EU_H2020 — the official Twitter feed of the European Union's flagship 
funding scheme Horizon 2020 — have already had much to discuss this year. Highlights include plans 
being drawn up by research commissioner Carlos Moedas on how to manage scientific advice after the 
abrupt axing of the chief adviser post held by Anne Glover; a live stream of a green-transport event; and 
the announcement of the first grants, including cash for projects to work on robots that wash floors and 
harvest sweet peppers. 


- ue 
However, @EU_H2020 has been quiet on a move by commission Painted stories 


president Jean-Claude Juncker to raid the Horizon 2020 budget for 


money to help set up a continent-wide investment fund. The floor- = Beenie oi tang 


: P toll on European science 
washing robots are safe: Juncker wants to drain the cash — some P 


€2.7 billion (US$3.1 billion) — from other parts of the budget, details of | * Unpaid bills complicate 


which were announced through more traditional routes last month. EU science budget crisis 
¢ After the Berlin Wall: 
Hardest hit is the European Institute of Innovation & Technology in Central Europe up close 


Budapest, which will lose €350 million over the next six years. The 
European Research Council will lose €221 million, starting next year. 


Also targeted is cash earmarked for projects across the continent over the coming years, including from 
information and communications technology, which will lose €307 million, food (€181 million) and 
nanotechnology, biotechnology and other advanced manufacturing techniques (€169 million). 


If Juncker’s proposal is approved by the European Parliament and Council, then the €2.7 billion will form 
part of a €16-billion European Fund for Strategic Investments that the European Commission hopes will 
stimulate state and private investment and lift the continent’s stagnant economy. 


@EU_H2020 might have been quiet on the move, but there have been howls of protest from those on 
the receiving end of the cuts. 


“Horizon 2020 is not a lemon! Stop squeezing it!” was the sharp response from the League of European 
Research Universities in Leuven, Belgium, when the cuts were first suggested last year. And the 
advocacy group EuroScience said that it “is not in principle against using a small part of the Horizon 
2020 budget for this purpose”, but that taking the money from the European Research Council sent “a 
very bad signal’. The European Research Area’s Stakeholders Platform, an umbrella group of various 
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organizations expressed “great concern” and warned that the cuts would undermine research and 
innovation efforts across Europe. 


a In response, European Commission officials say that the cuts come 
“Scientists must lobby for _ nee 
. . from an already generous budget — the original €80 billion in 

research and innovation to ; ; ; ; 
spending planned through Horizon 2020 makes it the most lucrative 

have a central role.” ; ; ; 7 
research funding scheme of its type in the world. The €2.7-billion 
reduction, they point out, could have been worse, and leaves the 

bulk of the programme intact. They argue that the funds will not truly be lost from science and research; 


they will return with interest when the strategic fund begins to bear economic fruit. 


Perhaps. But it is easy to have sympathy for the organizations that were banking on that money and 
must now try to fill the hole. It is also easy to question the use of the word ‘strategic’ in the title of the 
fund. Strategy is long-term, and the best and most enduring route to prosperity must remain the careful 
allocation of investment to research on science and technology — both pure and applied. 


Still, as Nature has argued before, scientists must accept that the boom times are over, at least for now. 
Money is tight and priorities are shifting. Those in Europe would do well to remember that. 


The new fund could be up and running as soon as September, so some scientists could still be waiting 
to hear whether they will join the pepper-picking robot researchers in receiving a Horizon 2020 grant 
(chances are, they won't, the programme is massively oversubscribed, sorry). In principle, research 
could yet benefit from the redirected money, but scientists and their representatives must lobby for 
research and innovation to have a central role in the projects — infrastructure and the rest — in which 
the new fund will invest. The European Research Area’s Stakeholders Platform has suggested 
amendments to the proposed legislation to make that happen, including giving researchers a say in how 
the money is allocated, and European officials should listen to that advice. 


Science may have lost out on the money, but it should not miss out on the opportunity. 


Nature 518, 5 (05 February 2015)  doi:10.1038/518005a 
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House of cards 


Western institutions must speak out against human-rights abuses in their partner countries. 
03 February 2015 


When the leaders of many of the world’s democracies flocked to Saudi Arabia last week to offer their 
condolences on the death of King Abdullah, many critics called it hypocrisy. They did so, too, when 
Saudi officials marched in Paris two weeks earlier to defend freedom of expression following the terrorist 
attacks there. 


After all, Saudi Arabia comes near the bottom of the world league in . 
Related stories 


terms of freedoms, such as the right to dissent, to freedom of 


expression or to practise any religion other than Islam, and has a track =» -SauObUNIVE ISI) BaeKs 


record of brutal human-rights abuses and political and religious slow road to 


oppression. But the kingdom’s oil and strategic geopolitical importance modernization 


in the turbulent Middle East means that it has long enjoyed strong ties | ° Scientists protest 


with the West. detention of Palestinian 
physicist 

Some scientists have been drawn to the desert state too, not least to e Task force seeks reform 

the King Abdullah University of Science and Technology (KAUST) in at Muslim-world 

Thuwal, a graduate university created by the king in 2009, which has a universities 

US$20-billion endowment. The university is the flagship of Abdullah’s 

efforts both to build a knowledge-based society in a country with little More related stories 


science base and to help distance science and education from the 
stifling influence and control of conservative clerics. 


As we report on page 18, some of these scientists have become caught up in the controversy over 
Saudi Arabia’s human-rights record. An international outcry has been sparked by the Saudi authorities’ 
flogging of the activist Raif Badawi in a public square in January — the first 50 of a sentence of 1,000 
lashes, along with 10 years in prison, for posts that he introduced on his website for social and political 
discussion. 


The Badawi case once again highlights the responsibility of researchers and scientific institutions who 
collaborate with authoritarian and repressive regimes such as Saudi Arabia to denounce human-rights 
abuses. Eighteen Nobel laureates explicitly raised that point in a letter last month to the president of 
KAUST, calling for “influential voices in KAUST” to speak out against Badawi’s brutal treatment, arguing 
that no university can be viable in a society lacking basic freedoms. 


Some scientists and their institutions, such as the US National Academies of Science, have a long 
history of speaking out to defend freedoms, and of campaigning on behalf of persecuted academics and 
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activists, although too many others remain silent. Still, there are concerns that such lobbying has 
lessened in recent years, with several scientific human-rights bodies, including those of the New York 
Academy of Sciences and the American Association for the Advancement of Science, shifting their 
focus to scientific diplomacy and softer human-rights issues, such as access to education, clean water, 
food and health care. Some have argued that working to open up repressive countries is more effective 
in the long term than publicly embarrassing them over individual cases of abuse. 


7 Others have rightly expressed concern that scientists and their 
Change cannot be expected se ; : 
: : : institutions may be increasingly reluctant to speak out to avoid 
to come quickly in Saudi 


: jeopardizing collaborations with countries, including China, that 
Arabia.” 


have dismal human-rights records. The many Western universities 

that have partnerships with KAUST and other Saudi institutions 
benefit from petrodollars, and the leading researchers who have joined the KAUST faculty benefit from 
competitive salaries and state-of the-art laboratory conditions. Western universities have also gained 
from the influx of hundreds of thousands of fee-paying Saudi students under a generous scholarship 
scheme established by King Abdullah. 


What can scientists there achieve by speaking out? Foreign researchers working at KAUST who were 
contacted by Nature seem sincerely convinced that, by educating and broadening the horizons of young 
Saudi Arabians, they can do more good by working to help to slowly open up the regime. The scientists 
are to be applauded for their efforts — this journal has long backed scientific cooperation as a form of 
diplomacy, for example with Iran, and has similarly opposed proposed scientific boycotts of Israel. 


Unfortunately, change cannot be expected to come quickly in Saudi Arabia because of the unique 
complexity of its society and culture. As Europe’s Enlightenment was taking shape in the eighteenth 
century, pushing back against religious authority and ushering in modern science, the Arabian peninsula 
was heading in the opposite direction. The Saudi state was born at the time out of an unholy alliance 
between Ibn Saud, a tribal leader, and Muhammad Ibn Abd al-Wahhab, the leader of Wahhabism, an 
extreme fundamentalist sect of Sunni Islam. That pact shapes Saudi rule and society to this day, 
resulting in a symbiotic agreement, with the conservative clerics giving the monarchy its support in return 
for their power to impose a society based on radical Islam, and an extreme form of sharia law. 


But there does not need to be a conflict between defending individual cases — either publicly or by more 
diplomatic, behind-the-scenes pressure — and broader outreach efforts. We need both. Campaigns for 
persecuted individuals whose plights otherwise risk going unnoticed can also, as in Badawi’s case, send 
the powerful message that the world is watching. Scientists at KAUST are perhaps not best placed to 
speak out, being at risk of potential retribution. But Saudi Arabia benefits hugely, not least in terms of its 
international image, from prominent collaborations with Western research organizations and universities, 
which have a duty to use that leverage to speak out on abuses, and to call for greater democratic 
reforms — both publicly and in their private dealings with their Saudi partners. 


Nature 518, 5-6 (05 February 2015) = doi:10.1038/518005b 
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Thuwal, a graduate university created by the king in 2009, which has a 
US$20-billion endowment. The university is the flagship of Abdullah’s 
efforts both to build a knowledge-based society in a country with 
little science base and to help distance science and education from the 
stifling influence and control of conservative clerics. 

As we report on page 18, some of these scientists have become caught 
up in the controversy over Saudi Arabia's human-rights record. An 
international outcry has been sparked by the Saudi authorities flogging 
of the activist Raif Badawi in a public square in January — the first 50 
of a sentence of 1,000 lashes, along with 10 years in prison, for posts 
that he introduced on his website for social and political discussion. 

The Badawi case once again highlights the responsibility of research- 
ers and scientific institutions who collaborate with authoritarian and 
repressive regimes such as Saudi Arabia to denounce human-rights 
abuses. Eighteen Nobel laureates explicitly raised that point in a letter 
last month to the president of KAUST, calling for “influential voices 
in KAUST” to speak out against Badawi’s brutal treatment, arguing 
that no university can be viable in a society lacking basic freedoms. 

Some scientists and their institutions, such as the US National 
Academies of Science, have a long history of speaking out to defend 
freedoms, and of campaigning on behalf of persecuted academics 
and activists, although too many others remain silent. Still, there are 
concerns that such lobbying has lessened in recent years, with several 
scientific human-rights bodies, including those of the New York Acad- 
emy of Sciences and the American Association for the Advancement of 
Science, shifting their focus to scientific diplomacy and softer human- 
rights issues, such as access to education, clean water, food and health 
care. Some have argued that working to open up repressive countries is 
more effective in the long term than publicly embarrassing them over 
individual cases of abuse. 

Others have rightly expressed concern that scientists and their 
institutions may be increasingly reluctant to speak out to avoid 
jeopardizing collaborations with countries, including China, that have 
dismal human-rights records. The many Western universities that have 
partnerships with KAUST and other Saudi institutions benefit from 
petrodollars, and the leading researchers who have joined the KAUST 
faculty benefit from competitive salaries and state-of the-art laboratory 


conditions. Western universities have also gained from the influx of 
hundreds of thousands of fee-paying Saudi students under a generous 

scholarship scheme established by King Abdullah. 
What can scientists there achieve by speaking out? Foreign research- 
ers working at KAUST who were contacted by Nature seem sincerely 
convinced that, by educating and broaden- 


“Change cannot ing the horizons of young Saudi Arabians, 
be expected to they can do more good by working to help to 
come quickly in slowly open up the regime. The scientists are 


Saudi Arabia.” to be applauded for their efforts — this jour- 
nal has long backed scientific cooperation as 
a form of diplomacy, for example with Iran, and has similarly opposed 
proposed scientific boycotts of Israel. 

Unfortunately, change cannot be expected to come quickly in Saudi 
Arabia because of the unique complexity of its society and culture. 
As Europe’s Enlightenment was taking shape in the eighteenth cen- 
tury, pushing back against religious authority and ushering in modern 
science, the Arabian peninsula was heading in the opposite direction. 
The Saudi state was born at the time out of an unholy alliance between 
Ibn Saud, a tribal leader, and Muhammad Ibn Abd al-Wahhab, the 
leader of Wahhabism, an extreme fundamentalist sect of Sunni Islam. 
That pact shapes Saudi rule and society to this day, resulting in a sym- 
biotic agreement, with the conservative clerics giving the monarchy its 
support in return for their power to impose a society based on radical 
Islam, and an extreme form of sharia law. 

But there does not need to be a conflict between defending 
individual cases — either publicly or by more diplomatic, behind- 
the-scenes pressure — and broader outreach efforts. We need both. 
Campaigns for persecuted individuals whose plights otherwise risk 
going unnoticed can also, as in Badawi’s case, send the powerful 
message that the world is watching. Scientists at KAUST are perhaps 
not best placed to speak out, being at risk of potential retribution. But 
Saudi Arabia benefits hugely, not least in terms of its international 
image, from prominent collaborations with Western research organi- 
zations and universities, which have a duty to use that leverage to 
speak out on abuses, and to call for greater democratic reforms — both 
publicly and in their private dealings with their Saudi partners. m 


Road test 


Realizing the benefits of driverless cars will 
require governments to embrace the technology. 


£10-million (US$15-million) project to study how autonomous, 
self-driving vehicles will fit into daily life in four parts of England: 
Greenwich, Coventry, Milton Keynes and Bristol. 

Good job. That is the right kind of question to ask about driverless 
cars. As described in a News Feature on page 20, developers such as 
Google are making rapid progress on the vehicles. From a technical 
standpoint, the cars could be ready for widespread deployment within 
a decade. But when and how they will hit the streets depends on how 
well people accept and trust them. 

Consider, for example, the obvious economic question: will people be 
able to afford them? Thanks to the need for sophisticated equipment, the 
vehicles are likely to be much more expensive than their conventional 
counterparts, at least initially. And that means that buyers will need to 
see correspondingly large benefits. 

A frequently cited benefit is safety: advocates insist that the vehi- 
cles could all but eliminate accidents. But convincing people that 
driverless cars can do away with human accidents and not make 
robot-minded mistakes of their own is likely to take a good number 


ik government funding agency Innovate UK has launched a 


6 | NATURE | VOL 518 | 5 FEBRUARY 2015 


of years and millions of kilometres of accident-free test drives. 

And when accidents do happen — as they surely will — public 
reaction will depend on the specifics of the event, and those are hard 
to predict. The legal issues may be even tougher. Right now, equipment 
failures are rare and the responsibility almost always rests with a driver. 
But with driverless vehicles, the courts and insurance companies will 
have to figure out how to apportion liability among the vehicle's occu- 
pants (who may be dozing off), the car maker, the software developers 
and even the mapping algorithm. 

Another much-touted benefit is fuel efficiency. But that is unlikely to 
be realized until most cars are equipped with systems that allow them to 
communicate with one another (called V2V systems) and with traffic 
signals to minimize stop-and-go traffic. 

Of course, some wealthy people will doubtless take the plunge. But the 
most important early adopters will probably be fleet operators: driverless 
ride-share systems could function as a new form of mass transit. And if 
the door-to-door service encourages more people to give up their car, 
then some of the vast areas devoted to parking could be put to other uses. 

Governments are likely to be crucial to the transition — not least 
because many of the benefits accrue to society as a whole. A good exam- 
ple is being set by the United States, which is considering a mandate that 
would greatly speed up the transition by requiring V2V radios in every 
new US car. Other countries should follow suit. 
To make such moves fully effective, however, local 
governments will need to start upgrading road- 
ways with smart signals designed to optimize traf- 
fic flow — assuming they can find the money. = 
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JEREMY SUTTON-HIBBERT 


WORLD VIEW .jecnsicor sen 


chairman of the Intergovernmental Panel on Climate Change 

(IPCC), when he rose to address a major conference on bio- 
diversity in Bonn, Germany, late last month. His signature green tie 
was absent; a red alternative hung in its place. 

Red for danger, Pachauri said, to acknowledge the peril facing 
ecosystems and much of the natural world. Danger, he added — pausing 
for effect — was not a word he could use in the highly politicized con- 
text of climate change. Researchers who investigate and log Earth’s 
diminishing biodiversity, he was hinting, have yet to encounter the 
kind of distortions and politicization that are a regular feature for those 
who work on global warming. But for how long will that continue? 

The Bonn conference was the third plenary meeting for a major 
initiative that explicitly aims to mimic the work- 
ings and impact of the IPCC, including eventually 
drawing up laws that would puta scientific brake 
on rampant development. As such, it is likely to 
make powerful enemies. One of its first reports 
will assess the state of pollinating insects. Oth- 
ers will explore the highly charged question of 
how to value ecology. The red tie is a sign of things 
to come. 

The initiative — the Intergovernmental 
Platform on Biodiversity and Ecosystem Services 
(IPBES) — was set up three years ago, although 
the idea was first mooted in the 1990s. The mood 
in Bonn was upbeat as delegates agreed its annual 
US$9-million budget and put the seal on a busy 
programme of work for the next five years. 

Recent controversies over the IPCC — claimed 
errors in its reports and debate about whether the 
panel should even continue in its present form — might seem to make 
the organization a dubious role model. Does the world really need 
another lumbering process that involves hundreds of scientists, who 
anyway need to have their final work signed off by representatives of 
politicians? 

Such a view underestimates the IPCC’s impact in one crucial area: to 
provide political impetus and an evidence-backed mandate for inter- 
national legislation. The agency's second assessment report, the one 
confirming a human fingerprint on climate, overcame political dissent 
at the time and led directly to the 1997 Kyoto Protocol, which remains 
the only agreement that legally binds states to reduce their greenhouse- 
gas emissions. The protocol is unfashionable in some climate-policy 
circles now, but the IPCC remains the model to drive a majority of the 
world’s governments to change laws in response 
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BY DEFINITION, 
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THE PLANET'S 


REMAINING 
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IN DEVELOPING 


NATIONS. 


Major biodiversity 
initiative needs support 


Aneffort aimed at protecting ecosystems, modelled on the agency battling climate 
change, willneed protecting from powerful enemies, warns Ehsan Masood. 


convention, an international agreement in which member countries 
promise to protect, sustainably use and share the benefits of bio- 
diversity, lacks teeth and has made little impact on slowing biodiversity 
loss. One-eighth of birds, one-quarter of mammals and one-third of 
amphibians are understood to be facing the threat of extinction, IPBES 
chair Zakri Abdul Hamid told the conference. The present consensus is 
that the rate of extinction is somewhere between 100 and 1,000 times 
the pre-industrial background rate. 

The first global report from the new biodiversity panel — similar 
to the periodic landmark assessment reports from the IPCC — is due 
in 2019. The initial shots in the conflict that could follow have already 
been fired. The United States so far finds itself unable to pay for its 
scientists to contribute; most of the money for the exercise, moreover, 
is coming from European countries. Not coinci- 
dentally, the United States has still to ratify the 
biodiversity convention, which many lawmakers, 
Republicans in particular, regard as anti-growth. 

Still, insiders expect the US government and 
its national institutions to play a bigger part in 
the coming months. One of its tasks, along with 
Europe and Latin America, will be to protect the 
role that conservation and industry groups have 
in the IPBES as observers. Some countries, nota- 
bly China, seem to want to restrict this. 

There is one major difference from the IPCC. 
Each IPBES assessment must include reference 
citations to indigenous knowledge, and every 
review panel must include experts in this. That 
is partly a concession to some developing coun- 
tries that, for many years, resisted the idea of the 
IPBES, fearing that it would be based, like most 
IPCC reports, on studies in peer-reviewed European-language jour- 
nals. It also reflects the fact that, by definition, most of the planet’s 
remaining biodiversity is in developing nations. 

In the science ministries of powerhouse nations, the study of 
indigenous knowledge is viewed as soft, flaky even. Compared with 
fields such as plant genetics, it is also less likely to be recognized by 
many leading science academies. Already, some representatives from 
Europe have complained that they cannot find suitably qualified indi- 
viduals to conduct or review assessments. 

Anticipating this, the IPBES has set aside funding to train and 
identify suitable experts, especially from developing countries. IPBES 
leaders should cast the net further, and draw in more experts from the 
social sciences and others — through learned societies of humanities 
scholars, for example. Biodiversity needs all the help it can get. = 


Ehsan Masood is editor of Research Fortnight and a co-author with 
Daniel Schaffer of Dry: Life Without Water. 
e-mail: ehsan.masood@researchresearch.com; Twitter: @EhsanMasood 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


What drives 
sugar addiction 


Researchers have identified 
the brain circuits that compel 
mice to seek out sweet treats. 

Kay Tye of the 
Massachusetts Institute of 
Technology in Cambridge 
and her colleagues genetically 
engineered mice so that the 
neurons in a brain circuit 
involved in reward processing 
would fire when exposed to 
light. When the researchers 
activated these neurons, the 
animals sought sugar more 
frequently through a port in 
their cage, even when they 
received a mild electric shock 
to their feet in the process. 
Switching the neurons off 
stopped the sugar-seeking 
behaviour, but did not prevent 
the mouse from eating its 
normal food. 

The researchers propose 
that targeting this pathway 
could bea possible therapy for 
compulsive overeating. 

Cell 160, 528-541 (2015) 


Robot zips away 
like an octopus 


A self-propelling, octopus- 
inspired rocket can zoom 
through water using energy 
more efficiently than fish that 
accelerate rapidly to escape. 
Gabriel Weymouth of the 
University of Southampton, 
UK, and his colleagues 
were inspired by the escape 
manoeuvres of the octopus, 
which fills its body with water 


Birds diversify at close quarters 


A population of birds in California has evolved 
diverse bill sizes and shapes, even within a 


small geographic area. 


Diversification within a species is thought to 
occur mainly when populations are separated 
by geographical barriers, such as mountains 
or bodies of water. To see if this happens 
ina single population, Kathryn Langin at 
Colorado State University in Fort Collins and 
her colleagues studied more than 500 island 
scrub-jays (Aphelocoma insularis; pictured), 


and then quickly expels it to 
dart away. On the basis of this 
principle, the authors built 
their rocket using a flexible 
hull, which they inflated 
with water (pictured). They 
then measured the rocket’s 
speed as it shot the fluid out 
through its base. As the rocket 
contracted, it achieved more 
than 2.6 times the thrust of a 
rigid rocket doing the same 
manoeuvre. 

The researchers 

calculate that making 
the robot bigger would 
improve its accelerating 
performance, and suggest that 
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California. 


their technology could be used 
in underwater vehicles. 
Bioinspir. Biomim. http:// 
dx.doi.org/10.1088/1748- 
3190/10/1/016016 (2015) 


Smoke makes 
tornadoes worse 


Smoke in the air could 
increase the likelihood of 
tornadoes forming, as well as 
their intensity. 

Pablo Saide of the 
University of Iowa in lowa 
City and his colleagues 


© 2015 Macmillan Publishers Limited. All rights reserved 


which have small ranges and live only on the 
250-square-kilometre Santa Cruz Island in 


The team found that birds living in pine 
forests had longer, shallower bills — presumably 
for prying open pine cones — than jays in oak 
forests, even though the two habitats were next to 
each other. Adaptations at this microgeographic 
level could be more common than once thought, 
even for mobile animals, the authors suggest. 
Evolution http://doi.org/zt3 (2015) 


studied a major tornado 
outbreak that killed more 
than 300 people in April 2011 
in the southeastern United 
States. It occurred when 
smoke particles from fires 
in Central America drifted 
across the region, which the 
team’s calculations suggest led 
to atmospheric changes, such 
as thicker and lower clouds. 
These had several cascading 
effects, including stronger 
wind shear, that make 
tornadoes more likely to form 
and more severe in areas that 
are prone to such storms. 
Meteorologists may need to 
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Mathematical model helps scientists decide where to 
submit their papers 


A study that ranks ecology journals on the basis of citation numbers and review times attracts 
attention online. 


Chris Woolston 
29 January 2015 


Deciding where to submit a scientific manuscript is a strategic decision for researchers who are 
balancing the desire for high impact with the need for quick publication. Two ecologists have tried to 
crack that conundrum by creating mathematical models that quantify the potential value of submitting to 
any of 61 ecology journals. In a study in PLoS ONE", they conclude that Ecology Lett&rin'Ecological 
Monographs and PLoS ONE are good choices for researchers seeking a large number of citations but 
relatively fast review times. Researchers on social media appreciated the effort. Ross Mounce, an 
evolutionary biologist at the University of Bath, UK, tweeted: 


| Ross Mounce +® Follow 
\ rmounce 


Where Should | Send It? Optimizing the 
Submission Decision Process 
journals.plos.org/plosone/articl ... HT 
@Protohedgehog @LaurenMaggio Super 
interesting! 


RETWEETS : /ORITES ©) ee We) 


8:04 AM - 26 Jan 2015 


But Marcel Holyoak, editor-in-chief of Ecology Letters, says that the analysis has limitations and is 
probably “most useful for a naive author without experience on which journals to target”. 


The PLoS ONE paper evaluated the 61 journals on the basis of data on acceptance rates and decision 
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times provided by the publications (Nature was not included in the analysis because the journal did not 
provide all the necessary data). One of the models is designed to maximize citations over a given time 
period, while taking into account the potential hassle and delay that comes with rejections and 
resubmissions. The model suggests that, in many instances, PLoS ONE would fall into a sweet spot that 
provides a moderate pay-off in citations and relatively little frustration. For example, submitting first to 
PLoS ONE instead of to a higher-impact journal such as Ecology Letters or Science might cost a paper 
up to 14 citations, but would probably avoid at least one revision while saving between 30 and 150 days 
of publication delay. The model also suggests that submitting to relatively low-impact journals, such as 
Wildlife Monographs or the Russian Journal of Ecology instead of PLoS ONE, would significantly reduce 
citations without much benefit in terms of waiting time or the chance of rejection. 


The paper offers an important lesson, says Mounce: submitting to top-tier journals isn’t always the best 

strategy, especially for scientists who need to rack up publications in a hurry for job promotions. “I know 
people who essentially had to quit academia because they chose the high-risk approach rather than just 
publishing their work in decent outlets,” he says. 


Jean-Michel Ané, a microbiologist and plant scientist at the University of Wisconsin—Madison, shared the 
article on the social publishing site scoop. it with the comment: “It would be great if they could do that for 
other disciplines like ... microbiology and plant sciences for instance!” 


Ané said in an interview that choosing where to submit a paper can be a high-stakes game when 
careers rest on it. Unfortunately, he says, some of the key variables in the decision can be hard to 
assess. “A journal may say that they review a paper within 10 days, but the reality can be very different,” 
he says. “A database where those parameters could be compiled and updated regularly would be super 
useful.” 


Holyoak says that he’s glad Ecology Letters fared well in the analysis, but he also believes that the 
model “is a bit crude”, largely because it assumes that each article has the same quality, which he sees 
as an oversimplification. Santiago Salinas, an ecologist now at the University of the Pacific in Stockton, 
California, and co-author of the paper, acknowledges that the merits of a manuscript should play an 
important part in submission decisions. He adds that the model originally had a “paper quality” variable 
but it proved to be too subjective to be workable. “We tried to keep the model as flexible as possible,” he 
says. 


Many commenters on Twitter noted that the paper appeared in one of the journals that the model 
recommends. That’s not an accident, Salinas says. “We actually used what we learned from the model 


to submit to PLoS ONE,” he explains. “We followed our own advice.” 


Nature 518, 9 (05 February 2015) doi:10.1038/518009f 
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SEVEN DAYS nscesinn 


US-budget hope 


US President Barack Obama's 
budget plan for 2016 has 
given scientists a ray of 

hope — although the proposals 
now face a rough road in 
Congress. The US$4-trillion 
budget proposal released on 

2 February offers $146 billion 
for scientific research and 
development. This is a 6% 
rise in the pot of money split 
between civilian and defence 
programmes, and comes after 
years of austerity. The plan is 
expected to meet with strong 
opposition in the Republican- 
controlled Congress. See 
page 13 for more. 


Vaccination boost 


GAVI, an organization 

in Geneva, Switzerland, 

that supports vaccination 

in low-income countries, 
raised US$7.5 billion during 
a 27 January pledging 
conference in Berlin. The 
money, largely from Western 
governments, will pay for 
GAVT’s immunization efforts 
between 2016 and 2020 

and will augment $2 billion 
in previously committed 
funding. The extra cash should 
enable a further 300 million 
children to be vaccinated, 
preventing some 6 million 
deaths, the organization says. 


China biosafety lab 
China opened its first 
biosafety level-4 research 
laboratory on 31 January, 
enabling work on dangerous 
pathogens such as Ebola. The 
Wuhan National Biosafety 
Laboratory is funded and 
operated by the Chinese 
Academy of Sciences. The lab 
was approved in 2003 after the 
SARS virus outbreak revealed 
shortcomings in the country’s 
ability to deal with emerging 
diseases. China enlisted 


Nuclear-waste plan is technically sound 


The proposal to build a nuclear-waste 
repository at Yucca Mountain, Nevada, is 
technically sound, the US Nuclear Regulatory 
Commission in Rockville, Maryland, said 

on 29 January. The 2008 application by the 
Department of Energy was abandoned by 

the administration of US President Barack 
Obama in 2010, but a federal court ordered 


French assistance to design 
the facility, which it plans to 
use to develop diagnostics 
and vaccines against highly 
infectious diseases. 


Dusty death 

A signal thought to be the 
first evidence of gravitational 
waves was caused by dust in 
the Milky Way rather than 
being a relic of the Universe's 
first moments. The team 
that described the signal 

in March 2014 withdrew 

its claim on 30 January. 
Combined data from the 
BICEP2 telescope at the 
South Pole and the European 
spacecraft Planck revealed 
that the distinctive polarized 
light pattern spotted by the 
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team was almost entirely due 
to Galactic noise. See page 16 
for more. 


Warmest year 

The World Meteorological 
Organization (WMO) has 
officially ranked 2014 as the 
warmest year since modern 
temperature records began. 
According to an analysis of 
the three most widely used 
global climate data sets, which 
have been collected since 
around 1880, the mean global 
air temperature last year was 
0.57 °C above the 14.00°C 
average for the reference period 
1961-90. Fourteen of the 15 
warmest years on record have 
now occurred in the twenty- 
first century, the WMO notes. 
The three hottest years — 
2014, 2010 and 2005 — are 


the commission to continue with the licensing 
process as long as it had the funds to do so. The 
commission notes, however, that construction 
would not have been possible — although 

a railway to bring in waste is already in 

place (pictured) — because the federal 
government had not secured land and water 
rights from Nevada, which opposes the project. 


only a few hundredths ofa 
degree apart, less than the 
margin of uncertainty of the 
measurements. 


Ebola trials 

Plans to test an experimental 
Ebola drug in Liberia have 
been scrapped, a US drug 
company said on 30 January. 
Chimerix, based in Durham, 
North Carolina, ended a trial 
of the antiviral brincidofovir 
because it failed to enrol 
enough people infected in the 
now-waning Ebola epidemic 
in West Africa. On 2 February, 
a different clinical trial of two 
experimental vaccines against 
Ebola virus began in Liberia. 
This trial aims to enrol around 
27,000 people to determine 
whether the vaccines can 
prevent infection. 


DAVID HOWELLS/CORBIS 
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| __BUSINESS 
Myriad settles suits 


A long-running patent dispute 
over genetic testing is nearing 
its end. Myriad Genetics 

of Salt Lake City, Utah, 
confirmed last week that it has 
settled lawsuits against three 
other medical diagnostics 
firms over their use of genetic 
tests that analyse mutations 

in the BRCA1 and BRCA2 
genes to estimate the risk of 
breast and ovarian cancer. 
The dispute began in 2009, 
and in 2013 the US Supreme 
Court struck down some of 
Myriad’s patent claims to 

the tests, but the company 
argued that it was protected 
by other patents. Myriad is 
negotiating a settlement with 
the remaining four companies 
that it has sued over the tests. 


PEOPLE 


Laser laureate dies 


Charles Townes (pictured), 

a US physicist who received a 
Nobel prize in 1964, died on 
27 January, aged 99. While at 
Columbia University in New 
York in 1954, Townes built a 
device called a maser, which 
stimulates atoms to emit 
coherent bursts of microwave 
radiation — the principle 
that led to the first laser. After 
moving to the University 

of California, Berkeley, he 
used lasers to detect the 

first complex molecules in 


TREND WATCH 


The US public generally supports 
science, but there seems to be a 
large gap between it and scientists 
on some controversial issues. 

Of about 2,000 adults surveyed, 
79% say that science has made 
life easier for most people, 

found a poll by the American 
Association for the Advancement 
of Science (AAAS) and the Pew 
Research Center, a think tank in 
Washington DC. But researchers 
are left questioning the gulf 
between them and the public on 
certain topics. See go.nature.com/ 
jnljfu for more. 


interstellar space and to find 
evidence for the black hole at 
the centre of the Milky Way. 


Nobel chemist dies 


French chemist Yves Chauvin, 
who shared the 2005 Nobel 
Prize in Chemistry, died 

on 27 January, aged 84. 

While working at the 

French Petroleum Institute 

in Rueil-Malmaison near 
Paris, Chauvin explained the 
mechanism behind a catalytic 
chemical reaction called 
metathesis, during which two 
molecules joined by doubly 
bonded carbon atoms swap 
their partners in a kind of 
dance. This understanding 
helped to develop catalysts 

to make metathesis more 
controllable, and it is now 
used widely in the synthesis of 
plastics, drugs and pesticides. 


Pill creator dies 
Chemist Carl Djerassi, 
widely known as the 
father of the birth-control 
pill, died on 30 January, 


OPINION GAP 


aged 91. He emigrated to 

the United States to escape 
Nazi Germany’s threat to his 
birthplace, Vienna, in 1939. 
Ten years later, he joined the 
pharmaceutical firm Syntex in 
Mexico City, where he led the 
research team that synthesized 
the first orally active steroid 
contraceptive, norethindrone. 
He later worked on 
biosynthesis and analytical 
chemistry at Stanford 
University in California, and 
wrote dozens of short stories, 
novels and plays. 


Environment chief 


The appointment of a scientist 
who has been dubbed gutsy 
and radical by the media 

as Communist-Party head 

of China’s environmental 
protection ministry is 

raising hopes on combating 
pollution. Chen Jining, 
currently president of 

Beijing's prestigious Tsinghua 
University, was named as chief 
on 28 January and is expected 
to become environment 
minister in March. There are 
high hopes that he will improve 
the enforcement of China’s 
environmental regulations, 

in particular a law that calls 

for tighter monitoring and 
punishment of polluters. China 
has often favoured economic 
growth over enforcing 
environmental laws, especially 
at the level of local government. 
See go.nature.com/n4wjjn for 
more. 


On hotly debated scientific issues, scientists and the public differ 
greatly, reveals a poll by the AAAS and Pew Research Center. 


Climate change is mostly 
due to human activity 

y 

Ss 


It is safe to eat genetical 
modified food 


Humans have evolved 
over time 


More nuclear power plants 
should be built 


Favour use of animals 
in research 


Childhood vaccines such as 
MMR should be required 


™@ Scientists MUS public 
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9-11 FEBRUARY 
Microbiologists, 
epidemiologists and 
policy experts will meet 
in Washington DC 

to discuss biological 
threats such as Ebola, 
antibiotic resistance 
and bioterror attacks at 
the American Society 
for Microbiology’s 
2015 Biodefense and 
Emerging Diseases 
Research Meeting. 
go.nature.com/xnstjz 


11 FEBRUARY 

The European Space 
Agency is to test its 
reusable spaceplane, 
the Intermediate 
Experimental Vehicle, 
or IXV. The IXV will 
launch atop a Vega 
rocket from Europes 
Spaceport in French 
Guiana, and is expected 
to return to Earth 

100 minutes later. 


Moon milestones 


Five teams picked up Google 
Lunar X Prize ‘milestone’ 
prizes on 26 January. The 
awards, worth a total of 
US$5.25 million, were 
established in 2014 to 
recognize steps towards the 
prize’s ultimate goal of landing 
a private spacecraft on the 
Moon by the end of 2016. The 
teams from Germany, India, 
Japan and the United States 
showed headway in landing, 
roving and imaging technology 
— albeit on Earth. Astrobotic, 
a spin-out company of 
Carnegie Mellon University 
in Pittsburgh, Pennsylvania, 
won prizes in all three areas. 
The awards were introduced 
after slow progress on the main 
$30-million challenge, the 
original deadline for which 
was in 2012. 
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Obama budget seeks big boost for science 


White House plan would increase research and development funding but faces rough road in 


Congress. 


Boer Deng, Richard Monastersky, Lauren Morello, Sara Reardon & Jeff Tollefson 


03 February 2015 


SSK a 


J. Scott Applewhite/AP 


US President Barack Obama’s budget aims to raise spending on climate and biomedical research, among 
other areas. 


When US President Barack Obama released his budget proposal on 2 February, he gave scientists and 
engineers a ray of hope — albeit one that is almost certain to be dimmed, if not extinguished. 


Obama’s US$4-trillion plan for fiscal year 2016 includes $146 billion for 


scientific research and development, a healthy 6% increase for a portfolio Top picks 


split roughly evenly between defence and civilian programmes. The 
eran poeta | from nature news 
proposal, which seeks to turn back years of fiscal austerity, is the opening 
salvo in what is likely to be a long war with the Republican-controlled e Science pours in from 
Congress over government spending. Rosetta comet 
mission 
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“The problem, fundamentally, is that this budget is dead on arrival,” says e GM microbes created 
Michael Lubell, director of public affairs at the American Physical Society in that can’t escape the 
lab 


Washington DC. 
e Crunch time for pet 


theory on dark matter 
But he says that Obama may yet succeed in convincing lawmakers to lift the 


spending caps known as sequestration, which were put in place in 2011. 

Congress agreed to ease the caps in 2014 and 2015, but a similar agreement for 2016 may be harder 
to broker. Republicans have expressed interest in ending limits on defence spending, and are seeking to 
compensate for that by making cuts to civilian programmes. 


“We need a real budget, one that allows responsible investments in critical federal programs — including 
our national defense — without breaking the bank and pushing our country further into deficits and 
debt,” said Hal Rogers (Republican, Kentucky), chairman of the House of Representatives 
appropriations committee, in a written statement. 


In the meantime, the president is moving ahead with a budget request that aggressively lays out his 
priorities for the twilight of his term in office, which ends in January 2017. One top concern is climate 
change, an area of sharp disagreement between the White House and the Republican-led Congress. 
The multi-agency US Global Change Research Program would receive a 9% increase in 2016, to $2.7 
billion. 


Climate and energy 

The US Environmental Protection Agency (EPA) would see an increase of roughly 6%, driving its 
budget to $8.6 billion — including $769 million for science and technology (see ‘Budget highlights’). The 
agency would receive $239 million to carry out climate-change regulations and initiatives, and 

$25 million to help states to comply with a rule — expected to be finalized this year — that would limit 
greenhouse-gas emissions from power plants. The budget would also create a $4-billion fund to help 
states that want to enact even stricter emissions limits on the power sector. 


Budget highlights 
How science agencies fared in the budget (US$millions). 


2014 2015 2016 
Agency actual estimated requested Details 


Biomedical research and public health 


National Institutes of | 30,070 30,311 21,011 Increases spending on antimicrobial 
Health resistance by $100 million 

Centers for Disease 5,863 6,073 6,170 Includes $12 million to help developing 
Control and countries to improve disease surveillance 
Prevention 
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Food and Drug 2,561 | 2,596 
Administration 

Physical sciences 

National Science 7,131 | 7,344 
Foundation 

NASA (science) 5,148 5,245 
Department of Energy 5,066 5,068 


Office of Science 


National Institute of 850 864 
Standards and 
Technology 


Earth and environment 


Environmental 8,200 8,140 
Protection Agency 
National Oceanic and 5,323 5,449 


Atmospheric 
Administration 


US Geological Survey 1,032 1,045 


Source: White House Office of Management and Budget, US Public Law No. 113-235 


2,744 


7,724 


5,289 


5,340 


1,120 


8,592 


5,983 


1,195 
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Proposes a new agency to monitor food 
safety, currently a joint responsibility of 
FDA and agriculture department 


Asks for a 5.2% rise from 2015 


Seeks $30 million to formulate a mission 
to the Jovian moon Europa 


Allocates $2.7 billion to clean-energy and 
energy-efficiency programmes 


Nearly doubles funding for programmes 
to support manufacturing 


Allots $239 million to climate-change 
regulations and initiatives 


Allocates $2.4 billion to Earth-observing 
satellites. 


Spends extra $32 million on science to 
increase climate resilience and 
adaptation 


Republicans are already promising to go after the EPA budget as they seek to head off the agency’s 


new climate regulations. “This is a giant press release,” says Frank Maisano, a lobbyist at Bracewell & 


Giuliani in Washington DC who represents energy-industry clients. “It’s just the first marker in what is 


going to be a very long discussion over the next year.” 


The White House plan also underscores the Obama administration’s long-standing emphasis on clean 


energy. Funding for the Department of Energy's (DOE’s) clean-energy technology and energy-efficiency 


programmes would rise by more than $800 million, to $2.7 billion, with sharp increases for clean-vehicle 
and building technologies and advanced manufacturing. 


The DOE’s high-risk, high-impact research agency, the Advanced Research Projects Agency—Energy, 
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would receive $325 million, a $45-million increase from 2015. The department’s budget also includes 
$560 million for research related to fossil fuels, including work on carbon capture and sequestration, and 
$908 million for nuclear-energy research. 


Overall funding in the DOE Office of Science would increase by more than 5%, to $5.3 billion, with most 
of the increase in basic energy sciences and advanced computing. The budget would slash the fusion- 
energy programme and prevent payments to the international fusion-reactor project ITER, which is 
behind schedule and over-budget, unless management reforms are undertaken. 


Space and Earth sciences 

NASA’s budget would increase by $500 million to $18.5 billion, with its science funding holding steady at 
around $5.3 billion. The agency’s Earth-science programme would get a major boost, increasing by 
roughly 10% to $1.9 billion. Some of that money would be used to begin planning Landsat-9, the next 
probe in a series of satellites that has monitored land use and land-cover change since 1972. NASA 
would also assume responsibility for climate satellites currently overseen by the National Oceanic and 
Atmospheric Administration (NOAA). 


The White House plan seeks $1.4 billion for planetary science, a drop of roughly 5%. That pot includes 
funds to capture and study a small asteroid by moving it into the Moon’s orbit. NASA is also requesting 
$30 million to begin formal planning of a mission to Jupiter’s moon Europa. The mission — to a body 
whose icy crust covers a watery ocean that could, perhaps, support life — has historically won only 
lukewarm support from the White House but received much stronger backing from Congress, which set 
aside $100 million for the plan in 2015. 


“The increase in the Earth-science budget, coupled with the decrease in the planetary-science budget, 
is just setting the administration up for a long debate with Congress,” says Marcia Smith, a space- 
programme analyst and founder of SpacePolicyOnline.com. 


The NASA request would boost funding for the Stratospheric Observatory for Infrared Astronomy from 
$70 million in 2015 to $85 million in 2016. The White House had sought to cut funding for the 
programme entirely last year, but supporters in Congress kept the specially outfitted Boeing 747 flying. 


NASA is seeking to end support for two long-running programmes in 2016: the Mars Opportunity rover, 
which has operated on the red planet since 2004, and the Lunar Reconnaissance Orbiter, which has 
mapped the Moon’s surface since 2009. The agency’s chief financial officer, David Radzanowski, says 
that NASA will decide in the summer whether to reconsider and continue operating the ageing 
Opportunity, but there is no guarantee that it will set aside money to do so. 


NOAA would receive just under $6 billion, up from $5.4 billion in 2015. More than one-third of the 
budget, about $2.4 billion, would go to the agency’s weather and climate probes — including 
$380 million to develop a mission to avert a potential data gap in the polar-orbiting satellite programme. 
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Obama’s budget also revives his failed 2012 proposal 
to move NOAA from the business-focused Department 
of Commerce to the Department of the Interior. The 
interior department has much in common with NOAA: 
it conducts environmental and climate research, 
oversees some fisheries and regulates ocean oil and 
gas drilling. But merging NOAA into the department is 
likely to remain a tough sell — even within Obama’s 
administration. “We have not been advocating for 
movement of NOAA,” interior secretary Sally Jewell 
told reporters on 2 February. 


The US Geological Survey would see its budget 


NASAJJPL/DLR increase by 14% over 2015, to $1.2 billion, with an 
Jupiter's moon Europa, target of a NASA extra $37.8 million for tools to support land 
mission. management and a $32-million boost for work on 
climate resilience and adaptation. 


Biomedical sciences 
For the first time in years, the US National Institutes of Health (NIH) was slated for a sizeable budget 
increase — to $31.3 billion, $1 billion more than it received in 2015. 


“This is exciting,” says Stefano Bertuzzi, executive director of the American Society for Cell Biology in 
Bethesda, Maryland. “In the scheme of everything, it brings us back to where we were before 
sequestration.” 


The White House has been particularly active in creating biomedical research programmes in recent 
months, requesting $215 million for a new Precision Medicine Initiative that would integrate health and 
genetic data from at least one million volunteers into a huge database, informing efforts to precisely 
tailor treatments to individuals (see Nature http://doi.org/zvh; 2015). That programme, to the relief of 
many researchers concerned about funding, will not be paid for out of current research programmes at 
agencies such as the NIH. 


The budget proposal also includes funding to implement a September 
executive order that charged an interagency working group with 


Related stories 


e Obama to seek $215 
million for precision- 


developing a national strategy to combat antibiotic-resistant bacteria. 


The 2016 budget request nearly doubles US government spending on 


antibiotic-resistance programmes to more than $1.2 billion, with medicine plan 


$461 million of that going to the NIH for projects such as developing ¢ Survey finds US public 
new antimicrobial agents, and basic research characterizing how still supports science 
resistance evolves. The Biomedical Advanced Research and e US lawmakers seek to 
Development Authority would receive $192 million to develop and revamp biomedical 
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stockpile those new drugs, and $280 million would go towards activities research 
such as surveillance of resistant strains by the Centers for Disease 


Control and Prevention (CDC) in Atlanta, Georgia. More related stories 


Current NIH programmes would also benefit under the plan. The agency’s budget for its share of the 
Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative would more than 
double, from $64 million in fiscal year 2015 to $135 million this year. 


The NIH budget request is still haunted by the ghost of the longitudinal National Children’s Study, which 
had received a total of $1.2 billion in funds before the NIH cancelled it in December 2014. Yet the 2016 
budget requests another $165 million for that specific study, or for related research to characterize 
environmental influences on children’s health. Because the study’s cancellation was so recent, there 
was not time to remove it from the budget, says NIH director Francis Collins. The agency is currently 
mulling what to do with the $165 million that was appropriated for the children’s study in 2015. 


Although the Ebola epidemic in West Africa is waning, the CDC would get $294 million more than in 
2015 to study emerging zoonotic infectious diseases, giving a total of $699 million for these activities. 
The agency would also receive a $12-million increase for its Global Health Security Agenda, which 
would help developing countries to improve surveillance and detect future outbreaks before they get out 
of control. 


The president’s budget also proposed to combine the food-safety offices at the Food and Drug 
Administration and the US Department of Agriculture (USDA) into a single agency within the 
Department of Health and Human Services (HHS). Ellen Murray, assistant secretary for financial 
resources at the HHS, says that the department has worked out few details about the proposed agency. 


Agricultural research was another big winner, with an 18% rise over 2015. A significant slice of that 
increase is a $450-million request for the USDA’s Agriculture and Food Research Initiative. That money 
would pay for competitive research grants in areas such as water quality, bioenergy, food safety and 
sustainable agriculture. 


Basic research 

The budget for the National Science Foundation (NSF) would grow to $7.7 billion, roughly a 5% 
increase from 2015 and a “very sustainable number for maintaining a healthy research environment”, 
says Meghan McCabe, a legislative-affairs analyst at the Federation of American Societies for 
Experimental Biology in Bethesda, Maryland. 


Of note were several new and continuing cross-disciplinary initiatives. The NSF is seeking $144 million 
to fund its part of the multi-agency BRAIN Initiative, a 35% increase over what was implemented in 
2015. Three other prioritized initiatives would stimulate cross-disciplinary research in food, water and 
energy systems; risk and resilience planning; and increased diversity among students in science, 
technology, engineering and mathematics. 
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The amount allocated to new facilities remained flat, and this year, as last year, the NSF did not request 
funds for new building projects. The agency will continue to fund three ongoing construction projects: the 
Daniel K. Inouye Solar Telescope, the Large Synoptic Survey Telescope and the National Ecological 
Observatory Network (NEON). However, the NSF is planning to explore new projects for the future, 
because construction of NEON will end in fiscal year 2016. The need for research and planning for 
those future projects is reflected in the NSF facilities budget, which, although declining slightly from $201 
million in 2015 to $200 million in 2016, puts 50% more towards concept development and planning. 


Another major presidential priority is innovation, particularly in defence, nanotechnology and 
manufacturing. The Defense Advanced Research Projects Agency, which supports some of the most 
audacious government-funded research, would increase by $101 million to $3 billion. A total of 

$1.5 billion would fund work across the government through the National Nanotechnology Initiative. And 
the president will seek to expand the national network of manufacturing-innovation institutes, something 
he has pushed for in previous budgets. 
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Gravitational-wave hunt 
enters next phase 


A landmark result has turned to dust, but the search for primordial cosmic ripples continues. 


BY ELIZABETH GIBNEY 


hen the physicists who run the 
South Pole-based radio telescope 
BICEP2 found what seemed to bea 


signal from the dawn of time, it felt like a slam 
dunk. They were hunting a pattern in cosmic 
radio waves that would confirm that the Uni- 
verse had expanded rapidly in the first fraction 
ofa second of its existence — an extremely faint 
signal if it exists at all. “We all thought this was 
the kind of problem that youd have to struggle 
with for along time,’ says Colin Bischoff, a cos- 
mologist at the Harvard-Smithsonian Center 
for Astrophysics in Cambridge, Massachusetts, 
and member of the BICEP2 team. For once, it 
seemed, nature was being kind. 

But it was not to be. Soon after the 
announcement was made, to great fanfare, in 
March last year, questions started to emerge. 
The BICEP2 analysis hinged on a twisted pat- 
tern detected in the polarization of the cosmic 
microwave background (CMB), light left over 
from the Big Bang. The team attributed the 
twists, known as B modes, to gravitational 
waves — ripples in space-time generated 
during the earliest moments of the Universe. 
Cosmologists think, but have never proved, 
that during this period the cosmos underwent 


a brief but tumultuous episode of expansion 
known as inflation. In September, however, 
results from the European Space Agency’s 
Planck space telescope suggested that dust, 
which can also emit polarized light, could 
instead account for the BICEP2 signal. 

Last week, those fears were confirmed with 
the long-awaited announcement of results 
from a joint analysis carried out by BICEP2 
and Planck (see go.nature.com/muyr3z). On 
30 January, research- 
ers from both teams 
described how they 
had overlaid data 
recorded by BICEP2 
with data from the 
same patch of sky 
recorded by Planck at a higher frequency, 
where the signal is almost entirely attributable 
to dust. Subtracting the portion of the signal 
known to be a result of dust left only a tiny 
excess, with a statistical confidence far below 
the level needed for a significant finding. There 
is now no reason to believe that BICEP2 saw 
anything but dust. 

Still, it may yet be possible to find evidence 
of gravitational waves in the CMB by explor- 
ing the existing excess in more detail, and by 
scanning the sky at different frequencies. The 
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team has learned lessons from its and Planck’s 
findings, which it is applying in new detectors. 

Ona bright day in December 2014, as the 
BICEP2 team was finalizing the joint paper 
with Planck, Bischoff was in a huge but near- 
empty hangar at Harvard, finishing upgrades 
to one of the five telescopes that together make 
up what is known as the Keck Array, also at the 
South Pole. Results from the array, which first 
started taking data in 2011, were rolled into the 
latest joint analysis, but will also be crucial to 
the next phase of the gravitational-wave hunt. 

Each of Keck’s five telescopes is individually 
as sensitive as the entire BICEP2 telescope, 
allowing the array to measure the faint signals 
— detected as tiny temperature differences 
— with even greater precision. A new role 
for Keck will be to look for dust: two of its tel- 
escopes have been tuned to detect polarized 
light at a higher frequency than BICEP2, at 
which dust is expected to polarize the CMB 
more strongly. A measurement of dust-based 
polarization more precise than Planck’s could 
give statistical weight to the small excess signal 
that remains in the BICEP2 data — if that sig- 
nal is, after all, from primordial gravitational 
waves. 

The array will be assisted by BICEP2’s 
replacement, BICEP3, which packs the same 
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sensitivity as the Keck Array into a single 
unit. It has already started collecting data 
and will be fully operational by the end of 
February, when the Antarctic summer fin- 
ishes. Like BICEP2, BICEP3 will look for 
B modes caused by gravitational waves, 
but with greater precision and sensitivity, 
allowing it to detect ever fainter imprints 
in the CMB. And it will search the sky at 
95 gigahertz (GHz), a lower frequency 
than its predecessor: the joint analysis with 
Planck, which scans the sky at a variety of 
frequencies up to 857 GHz, suggested that 
dust should have less of an effect at 95 GHz, 
making it more of a “sweet spot” for seeing 
a primordial signal, says Bischoff. 

Antarctica is the perfect place to look for 
tiny twists in cosmic light. Not only is the 
continent blessed with relatively clear skies, 
but its dry climate means that there is much 
less water in the air to absorb microwaves. 
Bischoff’s colleagues are now working 
feverishly through the remaining Antarctic 
summer to get the detectors ready. Even the 
isolation can be an advantage, says Bischoff: 
“It’s a good place to get work done, and it’s 
pretty beautiful” 

But BICEP3 and the Keck Array might 
be beaten by a rival. They point at the same 
region of sky as BICEP2, which turns out 
to be more polluted with dust than once 
thought. More luck might be had by the 
South Pole Telescope (SPT), which is less 
sensitive but scans the sky more widely and 
ata higher resolution, or the POLARBEAR 
telescope, installed in 2012 at the James Ax 
Observatory in Chile, says Peter Coles, a 
cosmologist at the University of Sussex, 
UK. “T wouldn't like to pick the likely win- 
ner, he adds. The SPT and BICEP teams 
are also working on a joint analysis: if the 
primordial signal is very weak, it will be 
harder to differentiate from another source 
of B modes known as gravitational lensing, 
which the SPT is optimized to study. 

Anthony Challinor, a cosmologist at the 
University of Cambridge, UK, is upbeat 
about the BICEP team’s chances. The 
researchers’ growing experience in untan- 
gling the CMB and the results obtained at 
varying frequencies puts them in good stead. 
“This is a very competitive field and the 
competition is catching up, but the BICEP 
team is still ahead of the game.” 

How did the drama of the discovery 
affect the team? Acknowledging “some 
ups and downs” in 2014, Bischoff shrugs 
his shoulders. “Looking back, we probably 
could have been more cautious,” he says. 
“But even with a low-key announcement 
there still would have been a large reaction 
one way or another.’ Despite the attention, 
he says, “I feel like mostly we've kept pretty 
steady”. m 


Additional reporting by Ron Cowen. 
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Silicene makes its 
transistor debut 


Creation of electronic device using atom-thin silicon sheets 
could boost work on other flat materials. 


BY MARK PEPLOW 


even years ago, silicene was little more 
than a theorist’s dream. Buoyed by a 


frenzy of interest in graphene — the 
famous material composed of a honeycomb 
of carbon just one atom thick — researchers 
speculated that silicon atoms might form simi- 
lar sheets. And if they could be used to build 
electronic devices, these slivers of silicene 
could enable the semiconductor industry to 
achieve the ultimate in miniaturization. 

This week, researchers took a significant 
step towards realizing that dream, by unveil- 
ing details of the first silicene transistor’. 

Although the device's performance is modest, 
and its lifetime measured in mere minutes, this 
proof of concept has already been causing a stir 
at conferences, says Deji Akinwande, a nano- 
materials researcher at the University of Texas at 
Austin who helped to make the transistor. Guy 
Le Lay, a materials scientist at Aix-Marseille 
University in France, agrees. 

“Nobody could have expected that in such 
a short time, something that didn't exist could 
make a transistor,’ he says. 

Le Lay was one of the first scientists to create 
silicene in the lab’, in 2012 (see ‘The rise of 
silicene’). The debut coincided with a growing 
sense that graphene was unsuitable for making 
transistors. Graphene may be the world’s most 
conductive substance, but it is missing a crucial 
characteristic. Unlike the semiconductors used 
in computer chips, it lacks a band gap — the 
energy hurdle that electrons must vault before 
they can carry a current. Band gaps enable 
semiconductor devices to switch on and off 
and to perform ‘logic’ operations on bits. 

“For logic applications, graphene is hope- 
less,” says Le Lay. By contrast, silicene can 
boast a band gap, because some of its atoms 
buckle upwards to form corrugated ridges, 
which puts some of its electrons in slightly 
different energy states. What is more, makers 
of electronic chips have been wary of ditching 
decades of silicon-manufacturing experience 
in favour of carbon, says Lok Lew Yan Voon, a 
theoretical physicist at the Citadel, the Military 
College of South Carolina in Charleston, who 
first named silicene and modelled its proper- 
ties back in 2007 (ref. 3). 

But handling silicene in the lab has been a 


THE RISE OF SILICENE 


Its carbon-based cousin graphene 
gets more attention, but silicene is 
catching up. 


| 1994 | First calculation of the 


structure of two-dimensional crystals of 
silicon (pictured) and of germanium. 


| 2004 | Andre Geim and Konstantin 


Novoselov report isolation of graphene. 


2007 


The name ‘silicene’ is coined. 


Fabrication of silicene 
nanoribbons; flurry of theoretical papers 
on silicene and germanene begins. 


| 2010 | Geim and Novoselov bag Nobel 
Prize in Physics for their experiments on 
graphene. 


| 2012 | Six independent reports of 


silicene sheets (formed on silver). 


| 2015 | First demonstration of silicene 


transistor. 


huge challenge. The material cannot be peeled 
froma solid block using sticky tape, as graphene 
can from bulk graphite. Instead, researchers 
produce it by letting a hot vapour of silicon 
atoms condense onto a crystalline block of silver 
in a vacuum chamber, a much trickier process. 
And unlike robust graphene, naked silicene is 
extremely unstable in air, making it difficult 
to transfer the gossamer sheet to more useful 
substrates — such as the guts of a transistor. As 
recently as last year, some researchers were still 
questioning whether silicene even existed. 

So Akinwande joined forces with Alessandro 
Molle at the Institute for Microelectronics 
and Microsystems in Agrate Brianza, Italy, to 
offer silicene some protection. They formed 
a silicene sheet on a thin layer of silver, and 
added.a 5-nanometre-thick layer ofalumina > 
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> on top. Then they peeled this silicene 
sandwich off its mica base, flipped it silver- 
side-up, and laid it on an oxidized-silicon 
substrate. Finally, they gently etched away 
some of the silver to leave two islands of 
metal as electrodes, with a strip of exposed 
silicene between them. 

“It's a very clever trick,’ says Le Lay, who is 
planning to try the process with germanene, 
a capricious, similarly structured ‘two- 
dimensional’ material made from germa- 
nium that his team created last year’. 

Clever it may be, but the transistor will 
not be making an appearance in mobile 
phones any time soon: the exposed silicene 
degrades in about two minutes. Still, that 
is long enough to measure its properties. 
Although its electrons are sluggish in 
comparison to graphene’s, the device does 
indeed have a small band gap. 

Laying an extra coating on top of the 
silicene transistor could also extend its life. 
Akinwande has used Teflon to help flakes 
of phosphorene — another air-sensitive, 
two-dimensional material, made of phos- 
phorus — to survive for months’. Other 
researchers have shown that using multiple 
layers of silicene could allow the sacrificial 
top layers to protect 


those beneath for “It’sdefinitely 
24 hours®. Crucially, a game- 3 
the technique used changer . This 
to make the silicene 18 the paper 
transistor couldnow we’ve been 
help totestall ofthese waiting for.” 


ideas, and more, with 

various air-sensitive materials. “It’s defi- 
nitely a game-changer,’ says Lew Yan Voon. 
“This is the paper we've been waiting for in 
the field” 

Not everyone is so enthusiastic about 
silicene’s prospects. “There’s a lot of talk 
about silicene, germanene and phos- 
phorene,” says Jari Kinaret of Chalmers 
University of Technology in Gothenburg, 
Sweden, who is the director of the European 
Union’s Graphene Flagship, a €1-billion 
(US$1.1-billion) research project to study 
two-dimensional materials and develop 
applications for them, “but the difficulties 
with them are still quite substantial.” 

Le Lay, however, is convinced that 
researchers will flock to silicene. “Now 
that a device has been made, he says, “other 
scientists will see it is not a dream material, 
it is a practical thing” m 
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SAUDI ARABIA 


BY DECLAN BUTLER 


Saudi Arabian activist Raif Badawi last 

month, the King Abdullah University of 
Science and Technology (KAUST) — a multi- 
cultural, world-class university in what seems 
an unlikely setting — is in the spotlight. 

Badawi received 50 lashes, the first in a sen- 
tence that stipulates a total of 1,000 lashes plus 
10 years in prison, as punishment for a website 
that he created for social and political discus- 
sion. As well as prompting an international 
outcry, the case has put KAUST’s leaders under 
pressure to speak out about the lack of freedom 
of expression in Saudi Arabia, where KAUST is 
based. Researchers at the university, however, 
argue that they can have a bigger impact on 
Saudi society — and perhaps on the Arab and 
Muslim world broadly — by quietly continuing 
in their efforts to create a world-class centre for 
research and critical thinking. 

“KAUST is built on values that I espouse 
as a scientist, and the impact of KAUST will 
be felt over time, in major part through the 
influence of its graduates,” says Mark Tester, 
an Australian who is associate director of 
KAUST’s Center for Desert Agriculture. 

A graduate university, KAUST was founded 


Pests the high-profile flogging of 
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The King Abdullah University of Science and Technology is a stark exception to strict Saudi society. 


cience oasis 
under pressure 


But scientists at Saudi Arabia’s leading university argue for a 
quiet approach to modernizing the nation. 


in 2009 by the late King Abdullah, with the 
goal of establishing a culture of science and 
enlightenment in Saudi Arabia and beyond. 

A stark exception to strict Saudi society, 
its campus in Thuwal, 90 kilometres north 
of Jeddah, imposes no discrimination on the 
basis of sex, religion or ethnicity. Unlike in the 
rest of the country, women and men mingle, 
and women can also drive. The freedoms on 
the campus were a condition of the promi- 
nent Western scientists who backed KAUST’s 
development. 

On 18 January, nine days after Badawi 
received the first lashes, 18 Nobel prizewinners 
from around the world wrote to Jean-Lou 
Chameau, the president of KAUST, calling on 
“influential voices in KAUST” to speak up for 
the freedom to dissent. The letter warns that 
KAUST’s international ties could be at risk if 
the restrictions on freedom of thought and 
expression in Saudi Arabia continue. 

One researcher familiar with KAUST, who 
requested anonymity because of the sensitiv- 
ity of the issues, says that if KAUST research- 
ers were to speak out or be politically active, 
it would have little effect on the regime and 
would risk providing ammunition for the 
institution’s critics in Saudi Arabia. KAUST 
is controversial there, the researcher says, 


SUSAN BAAGHIL/REUTERS 


and the state and clerics have sought to keep 
the university and its scientists at a distance 
from domestic political or social issues. “It is 
always under a microscope from conservative 
elements,” says the researcher. 

Scientists can do more for Saudi Arabia by 
working at KAUST than by criticizing it from 
outside, says Tester. “We are making a real con- 
tribution to the country through education, 
and through research advances.” 

KAUST has attracted leading scientists from 
around the world to join its faculty of around 
130, and has set up science centres to study 
regionally important issues such as desert 
agriculture, Red Sea research, desalination and 
solar energy. The campus now hosts 840 stu- 
dents from 69 countries, including 246 from 
Saudi Arabia and 302 women. 

“My philosophy is that I don’t think I'm 
compromising, but modestly contributing 
to opening up things,” says another foreign 
researcher who requested anonymity. 

Indeed, much of the international support 
that was crucial for KAUST’s development 
came with the understanding that Saudi Arabia 
would improve freedoms beyond the campus 
site. Yet, as the case of Badawi highlights, if any- 
thing, the kingdom seems to have stepped up 
its repression of freedoms since KAUST was 
founded. 

There was a “spirit of hope” when KAUST 


opened, says letter co-signatory John Polanyi, 
who won the 1986 Nobel Prize in Chemistry. 
But patience with the Saudi Arabian regime 
is now “wearing thin”. “I think the scholarly 
community has been slow to become aware 
that KAUST cannot be an island of freedom,’ 

he says. 
Tester argues that KAUST is educating a new 
generation of Saudi students, who will eventu- 
ally help to transform 


ee the kingdom more 

Bi ate real generally. een 

: ° existence is evidence 

re asst of the kingdom’s 

desire to develop,” 

through he says. “It will take 

education, and time, and I ask that 
through research people give us time” 

advances. KAUST is not the 


only academic force 
for change in Saudi Arabia. A multibillion-dollar 
scholarship programme launched by King 
Abdullah in 2005, and set to continue until 
2020, funds hundreds of thousands of Saudi 
undergraduates and postgraduates to study 
abroad. Scientists familiar with Saudi Arabia 
say that they suspect Abdullah’s plan was to 
produce a delayed benefit: after being exposed 
to alternative ideas and cultures, returning stu- 
dents would moderate Saudi society and ease 
the grip of conservative clerics. Education in the 
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kingdom is heavy on religion. “It’s more Koran 
than periodic table,’ says one researcher. 

Although large numbers of Saudi men have 
long had Western educations, one big differ- 
ence that the programme provides is that it is 
open to women. “That is what will be trans- 
formative; says another foreign scientist who 
has worked closely with KAUST. “But it’s not 
going to happen overnight.’ 

Still, scientists inside and outside KAUST 
agree that the establishment of a knowledge- 
based culture and economy will require reforms 
by the Saudi leadership too. “The whole idea 
behind KAUST was that King Abdullah wanted 
Saudi Arabia brought back into the mainstream 
of science,’ says one anonymous scientist. But 
modern science requires free thinking and 
creativity, and cannot flourish in a repressive 
culture, adds the researcher. “If Saudi Arabia 
is to take its place on the modern science and 
technology scene it really has to pay attention 
to its human rights.” m 


CORRECTION 

The News story ‘Rave drug tested against 
depression’ (Nature 517, 130-131; 2015) 
stated that ketamine acts by blocking 

the signalling molecule NMDA. The drug 
actually acts on the NMDA receptor. 
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DITCH THE DRIVER 


NO DRIVERS 
ALQUIRED 


Automation is one of the 
hottest topics in transportation 
research and could yield 
completely driverless cars in 
less than a decade. 


traffic fatalities every 
year worldwide 


of all accidents are due 
to driver error 


BY M. MITCHELL WALDROP 


his summer, people will cruise through “ve never seen anything move so quickly _ to expand those maps as part of the company’s Additional safety could come from equipping __ basic safety message: position, speed, direction 

the streets of Greenwich, UK, in electric from concept into products,’ says Richard _ ongoing efforts to photograph the world’s road- cars and trucks with Wi-Fi-like vehicle-to-vehi- _ of travel,” says Josh Switkes, chief executive of 

shuttles with no one’s hands on the steer- Bishop, an automotive consultant who headed _ ways for Google Maps, says Sebastian Thrun, an cle (V2V) radios, which would allow them to _ Peloton Technology, a start-up firm in Menlo 

ing wheel — or any steering wheel at all. a US Department of Transportation research engineer who founded the Google car project warn each other of dangerous situations such _ Park, California, that is seeking to commercial- 

The £8-million (US$12-million) programme on automated motorways inthe and ran it until 2013. as acar running through aredlight. That would ize V2V technology for heavy trucks. But that 
project, part ofa larger study of driverless cars 1990s. Although many technical challenges A tougher problem, says Thrun, is teaching give both driverless and human-operated vehi- _ will be enough to eliminate many accidents, he . . 
funded by the UK government, is just one of _ remain, developers say they can see clear paths _ the car how to respond to what he calls “the cles time to steer clear. says. Then there are the energy efficiencies — and th e@ D istri ct of 
many efforts that seek to revolutionize transpor- _ for solving most or all of them. long tail of unlikely events”. Early on, he says, Although V2V technology will probably be _ not just from platooning, he says, but from cars : 
tation. Spurred in part bya desire to end the car- At Google, for example, most of the driver- the Google team developed algorithms for incorporated into driverless vehicles, ithasbeen _ working together to minimize stop-and-go traf- C 0 | um b la have p ass ed 
nage from road accidents — about 90% of which _less-car work so far has been carried out using handling frequent, obvious challenges such developed largely through separate efforts. The _ fic, where fuel efficiency is dismal. If the cars can : 
are caused by driver error — the raceisonto standard passenger vehicles fitted with Global as intersections or rain-slicked roadways. But concept has already been road-tested as part also ask smart traffic lights to adjust themselves laws to a | | OW d rive rl ess 
transfer control from people to computers that Positioning System receivers and mapping tech- as the cars drove for thousands of kilometres, of the European Union's Safe Road Trains for _ to prevailing traffic density — a practice some- . 
never doze at the wheel, get distracted by text nology, along with radar to detect obstacles,a they recorded oddball events such as a plastic the Environment project, in which lines ofcars _ time known as Vehicle to Infrastructure (V21) Cars on the ir ro ad S 
messages or down too many pints at the pub. laser ranging system to scan the surroundingsin _ bag blowing across the motorway or a couch followed bumper to bumper behind a truck, | — then the system might be able to minimize 
Almost every major car maker is working __ three dimensions, and video cameras to identify _ sitting in the middle of the road. “There were like ducklings tailing their mother. These road _ the need to stop atall. 

on some form of automation, as aremanyelec- _ objects such as traffic lights, construction signs, | many more of those than we believed in the trains, or platoons, avoid catastrophic pile-ups Ultimately, the timescale for deploying these W@W 


tronics companies. But looming over everyone _ pedestrians and other vehicles (see ‘A world of _ beginning,” says Thrun. The only way to han- 
is the Internet giant Google: the company has __driverless cars’). The on-board computer — __dle such rare events has been to record them 
been widely acknowledged as the world leader with processing power equivalent to several _as they arise, devise responses with the help of 
in driverless-car research since October 2010, desktop units — integrates allthe information high-powered machine-learning algorithms — 
when it announced that ithad entered the fielda and decides how the car should behave in any and then test those solutions with simulations 
year earlier — and that its driverless test vehicles _ given situation. To lessen the load on the driving and yet more driving. 


because the V2V signals cues every car tohit technologies will depend on the answers to _ 
the brakes at the same instant asthe truck.And much broader questions. How much will they 
because of aerodynamics, the road trains saved cost? Who will own them — individuals, or 
at least 10% in fuel consumption. service companies that provide transportation 

Such experiments have piqued the interest of | on demand? Who will face legal liability when 
the car maker General Motors, which last Sep- _ a driverless car gets into an accident? And will 


had already logged more than 200,000 kilome- algorithms, Google equips the car with ultra- “If we do it long enough,” says Thrun, “the tember announced it willsupport V2V technol- _ people accept and trust them? 

tres on roads near its headquarters in Moun- detailed maps that tell it exactly what to expect, hope is that the software will be as safe as a ogy in future models. At first, such cars will have Such questions can only be answered through 
tain View, California, and elsewhere in the state. down to the height of every curb. human driver” — and eventually much safer. few other vehicles to talk to. But the US National _ experience. And given the pace of innovation 
The public’s enthusiastic response to that rev- Sceptics point out that this mapping require- | How long that will take remains an open ques- Highway Traffic Safety Administration islook- _ today, experience is accumulating fast. m 


elation galvanized car makers and government ment restricts the car to places thathavealready _ tion. Google has publicly estimated about five 
research-funding agencies around the worldto _ been surveyed to that level of precision, suchas years — but the company is currently not grant- 
accelerate their efforts in this arena. Mountain View. But it would be relatively easy _ ing interviews about its project. 


ing to issue regulations requiring V2V radios in 
all new US cars later this decade. M. Mitchell Waldrop is a features editor for 
“The vehicles would just be broadcasting a Nature in Washington DC. 
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ADAPTIVE TRAFFIC FLOW 
Smart infrastructure integrates V2V signals 
from the moving cars to optimize speed limits, 
traffic-light timing and the number of lanes in 
each direction on the basis of the actual traffic 
load. The result is a smoother flow, shorter 
travel time and less energy wasted at traffic 
lights or in traffic jams. 


DECISION AND ACTION 
To make the appropriate responses to ; 


rare events — such as a ball bouncing in from 
a playground, or a plastic bag blowing down 
the roadway — the cars rely on 

algorithms refined through millions of 
kilometres of test drives. 


: : ings in three dimensions, and video cameras 
Fully autonomous vehicles are developing faster than anyone would to identify objects such as traffic lights 
have thought a few years ago, with many experts predicting that construction signs, pedestrians and j 
they will become widely available in the next 5-10 years. Many other vehicles. 


questions remain, but it is already possible to imagine how this new 


e PERCEPTION 
ial r Vehicles use radar to detect obstacles, a 
\ laser ranging system to map the surround- 


world of driverless cars will work. 


ROUTE PLANNING 

An on-board computer 

uses sensor data to plot 

a route that gets the car 

where it needs to go, 

while avoiding people, , 
potholes and other 
vehicles. 


LOCATION 
Mapping software 

uses Global Positioning 
System data to tell the 
car where it is in relation 
to roads, traffic signals, 
and other landmarks. 
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= COMMUNICATION 


Fuel savings 
for cars that 
travel in 
formation. 4 


-— = alae) 


~ 


Vehicle-to-vehicle 
(V2V) radios send 
signals between cars, 
trucks and infrastruc- _ 
ture items such as 
traffic lights. 


ROAD TRAINS 
Vehicles can take 
advantage of 
aerodynamics 


and save fuel by 

following one /4) 
another almost i ‘J 
bumper to bumper. te 
They are protected 


from catastrophic 
pile-ups by their 
V2V radios, which 
allow all the cars 
in line to hit their 
brakes at the same 
time. 
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CITIES TRANSFORMED 


MASS TRANSPORT People increasingly 
give up owning cars in favour of calling 
companies to pick them up wherever 
they are and drop them off wherever 
they need to go — a driverless version of 
a ride-sharing service. 


LAND USE Urban centres begin to undo 
the many accommodations they have 


__ made for personal vehicles — starting 


with the vast quantities of real estate 
devoted to parking, which could be 
adapted to more productive uses. 


oUU 


One estimate of the 
number of US parking 
spaces. Many could be 


used for other purposes if 
people ride-share more. 
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INTERRUPTED 


Babies are increasingly 
surviving premature 
birth — but researchers 
are only beginning to 
understand the lasting 
consequences for their 
mental development. 


BY ALISON ABBOTT 


abienne never found out why she 

went into labour three months too 

early. But on a quiet afternoon in 

June 2007, she was hit by accelerat- 

ing contractions and was rushed to 

the nearest hospital in rural Switzerland, near 

Lausanne. When her son, Hugo, was born at 

26 weeks of gestation rather than the typical 40, 

he weighed just 950 grams and was immedi- 

ately placed in intensive care. Three days later, 

doctors told Fabienne that ultrasound pictures 

of Hugo’s brain indicated that he had had a 

severe haemorrhage from his immature blood 
vessels. “I just exploded into tears,” she says. 

Both she and her husband understood that 

the prognosis for Hugo was grim: he hada very 

high risk of cerebral palsy, a neurological con- 

dition that can lead to a life of severe disability. 
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The couple agreed that they did not want to 
subject their child to that. “We immediately told 
the doctors that we did not want fierce medical 
intervention to keep him alive — and saw the 
relief on the doctors’ faces,” recalls Fabienne, 
who requested that her surname not be used. 
That night was the most tortured of her life. 
The next day, however, before any change 
had been made to Hugo’ treatment, his doc- 
tors proposed a new option to confirm the 
diagnosis: a brain scan using magnetic reso- 
nance imaging (MRI). This technique, which 
had been newly adapted for premature babies, 
would allow the doctors to predict the risk of 
cerebral palsy more accurately than with ultra- 
sound alone, which has a high false-positive 
rate. Hugo’ MRI scan showed that the damage 
caused by the brain haemorrhage was limited, 


ILLUSTRATION: PADDY MILLS 


and his risk of severe cerebral palsy was likely 
to be relatively low. So just 24 hours after their 
decision to let his life end, Hugo’s parents did 
an about-turn. They agreed that the doctors 
should try to save him. 

Thanks to medical advances since the 
1970s, premature infants — those born before 
37 weeks of gestation — are increasingly able 
to survive. Some hospitals now try to save 
babies born as early as 22 weeks. But those 
developments are forcing doctors and parents 
to grapple with difficult decisions, because the 
chances of severe disability increase with the 
extent of prematurity. Cerebral palsy, for exam- 
ple, affects 1-2% of babies born at term, 9% of 
those born earlier than 32 weeks and 18% of 
those born at 26 weeks. 

That is just half the story. Neuroscientists 
are developing an increasingly sophisticated 
picture of premature infants’ brains that could 
help to inform medical decisions and treat- 
ments. From some long-term studies, they are 
learning that premature children face a higher 
risk than was previously thought of developing 
cognitive or behavioural problems — according 
to some studies, as many as half of them will. 

Researchers are starting to ask why this 
should be, whether it could be avoided and 
what is the best way to provide educational 
support for the affected children. “We need to 
gather a lot more data to understand what the 
best strategies are,” says Petra Hiippi, a neo- 
natologist and developmental paediatrician at 
the University of Geneva in Switzerland, who 
is following the brain development of children 
who were born prematurely. 


EARLY BIRTHDAY 

Prematurity — also called pre-term birth — 
is extremely common. According to World 
Health Organization statistics from 2012, 
more than one in 10 babies — around 15 mil- 
lion in total — are born prematurely each 
year. The great majority are born between 32 
and 37 weeks of gestation, but 1.6 million are 
born between 28 and 32 weeks and 780,000 
are born ‘extremely pre-term, before 28 weeks 
(see ‘Born too soon). 

In low-income countries, more than 90% 
of extremely pre-term babies born alive soon 
die, which helps to explain why prematurity 
is now the second biggest cause of death in 
children under five, after pneumonia. But in 
richer countries, with sophisticated neonatal 
intensive-care facilities, more than 90% of these 
extremely pre-term babies survive, and doctors 
are continuing to push the age of survival even 
earlier in development. Doctors in the United 
States are debating a controversial recommen- 
dation to lower the gestational age at which a 
baby should be considered potentially viable 
from 24 weeks to 23 weeks. In Japan, babies 
born at 22 weeks have been considered viable 
since 1991. 

Parents of premature children face agoniz- 
ing waits as their children fight for their lives. 


Hugo’s parents endured tense weeks during 
which their son had a series of operations to 
fix damaged organs, and to create essential 
connections between major blood vessels that 
had not had time to develop before birth. They 
knew he could still die at any time. “But I felt 
like we were back on the TGV,’ says Fabienne, 
referring to the French high-speed trains. “The 
train goes fast and it rocks frighteningly — but 
we were on it again” 

But what happens after the immediate dan- 
ger has passed? Just a few studies have so far 
followed up the long-term fate of premature 
babies, because it is time-consuming and 


“We were 
shocked to see 


just how many 
children had 
problems.” 


expensive to track them with sophisticated cog- 
nitive and behavioural tests over many years. 

One of the first studies to show the extent of 
developmental problems was EPIPAGE, which 
looked at a cohort of all live births between 
22 and 32 weeks of gestation from 9 regions 
of France in 1997, and a reference group of 
664 full-term babies’. Up to half of the prema- 
ture babies who survived to five years of age 
had some sort of neurodevelopmental prob- 
lem by then, and the impairments in cognitive 
development grew more pronounced for each 
extra week of prematurity. Ona score of cogni- 
tive ability, the team observed impairment in 
44% of those born between 24 and 25 weeks of 
gestation and 26% of those born at 32 weeks, 
compared with 12% of full-term controls. 

“We were shocked to see just how many chil- 
dren had problems,” says Hiippi. Moderately 
premature babies may be at lower risk than 
extremely premature babies, she notes, but 
there are many more of them. 

The effects seem to continue into adulthood. 
Developmental psychologist Dieter Wolke 
led an unusual study of hundreds of children 
born between 26 and 31 weeks of gestation in 
Bavaria in the mid-1980s. He assessed them 
at six years old’, and again at 26 years’. Last 
year, he reported’ that most of those who had 
cognitive problems as children still had them 
as adults: one-quarter of them had moderate 
to severe cognitive deficits, and half had mild 
cognitive deficits. Most of those who expe- 
rienced problems had short attention spans, 
and as a group they tended to underachieve 
academically and career-wise. 

Wolke, who is currently at the University 
of Warwick, UK, observed subtler lifestyle 
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differences, too. “They are less likely to take 
risks, smoke, drink or have early sexual rela- 
tionships,” he says. 

Scientists are still struggling to understand 
the physical changes in the brain that underlie 
all these differences. The brain is made up of 
grey matter, which comprises densely packed 
cell bodies, and white matter, the long-reach 
axons of cells that connect different brain 
regions. These axons are covered in a protective 
coating called myelin during development, ina 
precise sequence that begins in the womb and 
continues for the first decade or so after birth. 

In the premature brain, immature, fragile 
blood vessels struggle to provide tissue with 
enough oxygen for normal development. When 
a vessel ruptures, crucial areas of white mat- 
ter are destroyed and cerebral palsy can result. 
But very little is known about what causes the 
more subtle brain problems that cohort studies 
of premature infants are revealing. 


TOO MUCH TOO SOON 

Scientists suspect that when the brain is forced 
to carry outa crucial part of its development 
while the child is in the outside world instead 
of a warm, watery womb, it receives inap- 
propriate signals from the environment that 
affect how its neurons are linked into net- 
works. “The premature brain gets subjected 
to quite different sensory inputs — like visual 
stimulation and gravity effects — which it is 
not supposed to be subject to,” says Ghislaine 
Dehaene-Lambertz of the INSERM-CEA Cog- 
nitive Neuroimaging Unit in Paris, who studies 
language development in infants. “They can 
be sudden, intense but also unpredictable.” 
Some of these unnatural sensory signals are 
inevitably provided by the intensive medical 
procedures that keep premature babies alive. 

Pioneering brain-scanning studies support 
the idea that altered networks play a part in 
cognitive problems. Huippi’s Swiss collabora- 
tion looked at 52 six-year-olds who had been 
born prematurely, using MRI scans optimized 
to reveal tracts of neurons connecting brain 
regions’. Compared with children born at 
term, the premature children’s neuronal tracts 
were organized less efficiently, often taking 
a more meandering path. These changes in 
organization were correlated with reduced 
social and cognitive skills. 

In another study, neonatologist Jeffrey Neil, 
then at St. Louis Children’s Hospital in Mis- 
souri, and his team used functional MRI to 
study the premature brain at rest. The low- 
level, idling activity of a resting brain gives a 
read-out of its working connections, whose 
general topology is laid out before birth (see 
Nature 489, 356-358; 2012). The team showed’ 
that in babies born between 23 and 29 weeks 
of gestation, this ‘resting-state connectiv- 
ity’ tends on average to be less complex and 
active at term-equivalent age than it is in full- 
term babies at birth. Another study — on the 
26-year-old Bavarians — showed’ that this 
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BORN 100 SOON 


Second trimester 


Just over 11% of live births worldwide are pre-term — before 37 weeks of gestation — and premature 
birth is the second largest cause of child deaths in under-fives. The medical risks increase with the extent 
of prematurity; neuroscientists now think that some effects on brain development may last into adulthood. 


Third trimester 


Completed weeks 16 20 24 28 32 36 40 
of pregnancy Extremely pre-term Very pre-term Moderate or late pre-term Term Post-term 
<28 weeks 28-32 weeks 32-37 weeks 37-42 weeks >42 weeks 
PRE-TERM BIRTH 
22 WEEKS 22-32 WEEKS 23-39 WEEKS 24 WEEKS 26 WEEKS 34 WEEKS 


Some hospitals 
now try to save 
babies born as 
early as 22 weeks. 


One long-term study found that 
up to half of children born in this 
window have some neurodevelop- 
mental problem at age five. 


reduced complexity of resting-state connec- 
tivity stretches into adulthood. 

Researchers agree that the most revealing 
studies would monitor the brains of prema- 
ture babies and full-term comparison babies 
from as early as possible after birth, with 
follow-up scans and assessments throughout 
life. But such studies are difficult, and not only 
because it is hard to keep tabs on families who 
may move house, lose interest or lose touch 
over the years. Parents are rarely keen for their 
newborns — whether premature or full-term 
— to be whisked away into the loud and lonely 
chamber of a distant MRI machine without a 
burning medical reason. (In some countries, 
such as the Netherlands, it is illegal to do so.) 
And not all obstetricians are comfortable with 
subjecting delicate premature babies to brain 
scans at a medically and emotionally fraught 
time. Fabienne was happy for Hugo to be 
scanned, but recalls how painfully long it took 
to get from the paediatric ward to the scanning 
suite in her hospital. “I felt half-dazed walk- 
ing alongside Hugo in his incubator through 
along underground tunnel to get there,” she 
says. “It looked like the tunnel you are sup- 
posed to see when you are dying.” 

A small vanguard of scientists and clinicians 
is pushing ahead, and several large long-term 
studies are under way around the world, col- 
lecting neurological, cognitive, behavioural and 
genetic data from birth, along with brain scans. 

In France, EPIPAGE 2 is now running, 
and has recruited more than 4,200 prema- 
ture babies from all over the country’. In the 
United Kingdom, a team led by neonatologist 
David Edwards of King’s College London has 
launched a study that will track children from 
their time in utero until they are two years old, 
collecting brain scans and blood samples along 
the way. Some of these children will inevitably 
be born prematurely, and the plan is to identify 
molecular signatures that might predict which 
of those infants are particularly vulnerable, or 
resistant, to altered neurodevelopment. 

Edwards’ preliminary studies* on premature 
babies suggest that some genes — including 


50% chance of 
survival with neonatal 
intensive care in most 
high-income countries. 


Brain-scanning studies point 
to atypical structural and 
functional connections in the 
brains of premature infants. 


several associated with lipid metabolism, which 
is crucial for white-matter development — may 
modify the risk of altered brain development. 
“Having a particular genetic profile might make 
certain babies less vulnerable,” he says. 


BRAIN PROTECTION 

With scientists still working to identify the 
molecular, cellular and network differences 
in the premature brain, finding treatments 
seems a fond hope. But Hiippi is attempting 
to do it. She is conducting a clinical study of 
erythropoietin, or EPO, a drug that stimulates 
the production of red blood cells. It is already a 
standard treatment to aid oxygenation of inter- 
nal organs — not to mention being a favourite 
among endurance-sport cheats — and it is also 
thought to protect and support neurons. 

Anecdotal reports had suggested that eryth- 
ropoietin might help long-term neurodevelop- 
ment, and Hiippi’s team is assessing this in a 
prospective, randomized and controlled study 
in nearly 500 very premature babies born in 
Switzerland, who are being MRI scanned at 
term-equivalent age. The first results, pub- 
lished in 2014, showed’ that treated babies had 
fewer signs of neurological problems than did 
children in a control group. But the acid test, 
says Hiippi, will come when they are assessed 
at two years old, when neurodevelopment has 
proceeded further. 

Where does all this leave parents, who still 
have to make decisions about their children’s 
treatment with only limited information about 
the long-term prognosis? Some, such as Fabi- 
enne, can be helped by MRI scans that can 
detect damage in white and grey matter, and 
make it possible to predict the risk of severe 
brain damage more precisely than in the past. 
Hiippi says that the technology helps doctors to 
advise parents, “and it is a terrible responsibility 
if we are wrong”. But this does little to identify 
which children will have milder developmental 
problems, or what those might be. 

Edwards and others think that brain imaging 
alone can never provide that type of information 
— but that combining scans with genetic and 
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50% chance of survival 
with neonatal intensive 
care in many 
low-income countries. 


Cerebral palsy affects 
around 10% of very 
pre-term babies, but 18% 
of those born at 26 weeks. 


other molecular and clinical data may eventu- 
ally lead to much greater precision. Should this 
become possible, it will throw open a whole new 
debate about how best to ameliorate any future 
problems for premature children, through 
very specific social and educational support — 
something that neuroscientists and education 
experts are only beginning to grapple with. 

Fabienne, like many parents of premature 
children, would like to have that information. 
Hugo, who is now seven, occupies most of her 
time. He has difficulties with fine movements, 
and some visual problems; he also needs a 
lot of extra assistance at school. Fabienne is 
deeply engaged with educational training 
programmes, which she hopes will be helpful, 
although she cannot know for sure. But Hugo 
is an unadulterated joy to her, and she is end- 
lessly grateful for the MRI scans that were so 
crucial in the decision to save him. “Neuro- 
science was able to say that Hugo would be able 
to have a reasonable quality of life,” she says. 

And she monitors from a distance the new 
wave of scientific interest in the brains of pre- 
mature babies. “Neuroscience is coming up 
with a lot of good information — I really hope 
that they will soon translate what they are dis- 
covering into concrete actions that parents can 
usefully undertake? m 


Alison Abbott is Nature’ senior European 
correspondent. 
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Standardize antibodies 
used in research 


To save millions of dollars and dramatically improve reproducibility, protein-binding 
reagents must be defined by their sequences and produced as recombinant proteins, 
say Andrew Bradbury, Andreas Pliickthun and 110 co-signatories. 


( entral to reproducibility’ in biomedical 
research is being able to use reagents 
that are identical to those described 

in publications. Alarmingly, there are seri- 

ous flaws in the reliability of antibodies, the 
most widely used class of protein-binding 
reagent”’. 

In the body, antibodies help to fight patho- 
gens. In the lab, biologists have long used 
them to track proteins of interest because they 


bind to specific targets. But in a 2008 study’ ; 
fewer than half of around 6,000 routinely used 
commercial antibodies recognized only their 
specified targets, with some manufacturers 
producing consistently good antibodies, and 
others consistently poor ones. 

This figure may be optimistic’. In fact, 
we believe that poorly characterized and 
ill-defined antibodies were in large part to 
blame for a study co-authored by C. Glenn 


Begley (a co-signatory to this article) being 
able to replicate the scientific results of only 
6 of 53 landmark preclinical studies”. Across 
biomedical research, the resulting waste in 
materials, time and money is vast — costing 
an estimated US$350 million annually in the 
United States alone. 

To stem this loss, we call for an 
international collaboration and funding 
initiative to define all binding reagents > 
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> according to the sequences that encode 
them. Crucially, researchers should use 
recombinant antibodies or binding reagents. 
These are made from reliable cell lines by 
isolating and incorporating the genes into 
plasmid DNA and transferring the plasmids 
into cells or bacteria for culture. 


WILDLY VARIABLE 

Researchers have used polyclonal antibodies 
for decades. These are produced by injecting 
a target (typically a protein) into an animal 
such as a rabbit and using the resulting serum 
as a source of antibodies. However, only 
0.5-5% of the antibodies in a polyclonal rea- 
gent bind to their intended target. And func- 
tionality varies from batch to batch, because 
immunizing an animal — even the same one 
— never results in exactly the same mix of 
antibodies, making it hard for researchers 
to be sure of the specificity of any particular 
batch of binding reagent obtained in this way. 

Four decades ago, the first monoclonal 
antibodies were made — by fusing a normal 
antibody-producing B-lymphocyte cell with 
a cancer cell to produce a ‘hybridoma’. The 
biomedical community believed that the 
resulting cell lines that produced (ideally) a 
single antibody species would solve many of 
the challenges of polyclonals. Unfortunately, 
monoclonals are far from problem-free. 

Hybridoma cell lines can die off, lose their 
antibody genes, or simply not grow when 
taken out of frozen storage — meaning that 
the source of a particular monoclonal anti- 
body may be lost forever. Furthermore, such 
antibodies may bind to more than one tar- 
get, either because the antibody is actually a 
mixture of antibodies with multiple specifi- 
cities, or simply because it is able to bind to 
several proteins. Careful characterization is 
thus required. 

Most pharmaceutical and large 
biotechnology companies have whole depart- 
ments dedicated to validating and character- 
izing antibodies. Consequently the reagents 
used in most clinical trials and especially in 


RELIABLE BINDING REAGENTS FOR AL 


REAGENT SEQUENCES IN DATABASE 


In-house production 
Researchers order gene sequences and 
make their own reagents in house. 


Commercial distribution 
Companies stockpile commonly used 
reagents or generate reagents on demand. 


Non-profit distribution 
Non-profits stockpile commonly used 
reagents or generate reagents on demand. 


medical procedures cleared by the US Food 
and Drug Administration or the European 
Medicines Agency are extremely reliable. 
Outside clinical trials, reagents are rarely 
validated to the same degree. What is more, 
only 44% of publications provide enough 
information — for instance, on the supplier — 
for researchers to be able to purchase the same 
antibody’. The quality of the documentation 
that accompanies batches (such as on func- 
tionality in different assays) is enormously 
variable’; even when it is provided, it may not 
correspond to the batch supplied*. 


TWO STEPS 

If all antibodies were defined by their 
sequences and made recombinantly, 
researchers worldwide would be able to use 
the same binding reagents under the same 
conditions. Immortal production lines of 
recombinant antibodies — which express no 
extra antibody chains — can be engineered 
by incorporating plasmids containing 
antibody DNA into cell lines. 

In practice, improving the quality of 
protein-binding reagents will require two 
steps. First, the sequences should be obtained 
for widely used hybridoma-produced mono- 
clonal antibodies. These antibodies should 
thence be produced recombinantly. (Poly- 
clonal antibodies should be phased out of 
research entirely.) 

Second, the research community should 
turn to methods that directly yield recombi- 
nant binding reagents that can be sequenced 
and expressed easily. These include display 
and two-hybrid methods — in which the 
best binders are selected from billions of 
variants — as well as approaches in which 
antibodies are identified from the sequenc- 
ing of millions of animal or human B cells 
after immunological challenge. 

Using sequence information asa universal 
reference system, researchers will be able to 
choose the binding reagent best suited to their 
requirements and use them ina standardized 
way (see ‘Reliable binding reagents for all’). 


DISTRIBUTION METHODS 
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Various classes of binding reagents are being 
developed in addition to antibodies, includ- 
ing recombinant protein scaffolds — proteins 
with artificially introduced binding surfaces” 
—and nucleic-acid-based binding reagents”. 
Some alternative reagents are easier to use and 
manufacture than antibodies. 


MARKET FORCES 

In our view, producing recombinant versions 
of the antibodies commonly used in research 
could be commercially profitable — even 
when the sequence information is publicly 
available. The absence of these reagents in 
the research-antibody marketplace stems 
mostly from economic considerations, rather 
than technical challenges. Most commercial 
producers have simply been attracted to the 
more lucrative therapeutics market. 

Production costs for recombinant 
binding reagents — currently similar to 
those for monoclonals — should decrease 
as technologies improve, demand increases 
and processes become automated. Although 
researchers could, in principle, make the 
recombinant reagents themselves if the DNA 
sequences are freely available, most will pre- 
fer the convenience of receiving a quality- 
controlled product from a supplier. On this 
basis, two years ago, a UK start-up called 
Absolute Antibody began selling sequenced 
monoclonal antibody reagents produced 
recombinantly. Many of the sequences are 
already in the public domain. 

That said, market forces alone are not going 
to improve the quality of protein-binding rea- 
gents — as the past 30 years have shown. Of 
the nearly 2 million antibodies listed in the 
database CiteAb, most are used too rarely in 
research to constitute an appealing commer- 
cial prospect. 

Achieving a wholesale transition to 
characterized, recombinant protein-binding 
reagents will require the public funding agen- 
cies of the world’s largest economies — North 
America, Europe and Asia — to make a major 
investment in technology development. This 


Making the sequences of all binding reagents freely available would give 
researchers and suppliers a universal reference system. 


STANDARDIZED BINDING REAGENTS 


Ge 


Recombinant 
monoclonal antibodies 


ATS 


= Aptamers (nucleic-acid-based reagent) 


Recombinant antibodies 


Non-antibody 
binding reagents 


MONEY DOWN THE DRAI 


U MILLION 
Global spending on 
protein-binding reagents 


$1.6 BILLION 


The use of poorly characterized and ill-defined antibodies 
wastes materials, researcher time and money. 


Annual spending 
on ‘bad’ antibodies 


————— 


oD 


$350 MILLION 


Nearly half of all money wasted 
on ‘bad’ antibodies worldwide 


is spent in the United States. 


All costs estimates assume that 50% of antibodies are validated and that researchers buy ‘bad’ antibodies as often as they buy ‘good’ ones. 


would reduce costs and increase efficiency in 
the key bottlenecks: the production of reagent 
targets, the selection of binding reagents and 
their downstream characterization. Five-year 
pilot programmes initiated in 2010 — the 
Protein Capture Reagents Program of the 
US National Institutes for Health (NIH) and 
Affinomics by the European Union (EU)— 
have indicated that recombinant technolo- 
gies could be scaled up. These programmes 
should be expanded and investment in them 
sustained for at least a decade. 

We estimate that using current technol- 
ogy, roughly $1 billion would be required to 
generate characterized recombinant bind- 
ing reagents to target the primary products 
of all 20,000 human genes. This is probably 
less than what is wasted worldwide on bad 
reagents in two years (see ‘Money down the 
draim), and would be easily recouped over the 
long term thanks to more reproducible data. 

However, we do not advocate the stock- 
piling of reagents against targets of minimal 
interest. Instead, efficient pipelines should be 


developed to generate any high-quality bind- 
ing reagent on demand. One possibility is that 
centres funded by public and private funds 
would focus on target production, selec- 
tion, characterization and the publishing of 
sequence information based on requests from 
users, whereas commercial companies would 
specialize in producing and distributing the 
reagents. An instructive analogue is the Struc- 
tural Genomics Consortium. This partner- 
ship has over the past decade used public and 
private funds to generate effective produc- 
tion lines for human proteins, deployed by 
academia and pharmaceutical companies. 

As a first step, we ask the scientific 
leadership of the NIH and the EU to convene 
academic users, technology developers, bio- 
tech companies, funding agencies and pub- 
lishers, and establish a realistic timetable for 
the transition to these high-quality binding 
reagents. 

Making sequenced well-characterized 
reagents is alone unlikely to change the 
behaviour of researchers. One possible 


outcome of such a meeting could be that 
publishers and funding agencies should 
mandate that in, say, five to ten years time, 
and contingent on appropriate investment, 
all binding reagents in published papers are 
recombinant and defined at the sequence 
level. This would mirror the requirements 
for the past few decades that gene sequences 
and coordinates for new protein structures 
be deposited and made publicly available. 
If these steps are taken, scientists will not 
want to use unsequenced binding reagents, 
and the absence of sequencing information 
will lead to market disadvantages for ven- 
dors. The uncharacterized, unsequenced 
research antibody will become obsolete. = 
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Start research on 
climate engineering 


Safe, small-scale experiments build trust and road-test governance, argue 
Jane C. S. Long, Frank Loy and M. Granger Morgan. 


limate engineering — cooling Earth 
intentionally by modifying its radia- 
tion balance — worries many peo- 
ple. We know little about the effectiveness 
of these technologies or their side effects. 
The unintended consequences could be 
profound. One country’s interventions 
will affect others and could distract from 
climate-change mitigation efforts, and 


there is no international mechanism for 
regulating such deployments. These are 
legitimate concerns. 

But interventions may need to be con- 
sidered in the future. The 2013 report of 
the Intergovernmental Panel on Climate 
Change suggested that even if the world 
almost eliminates greenhouse-gas emis- 
sions by mid-century, decades of climate 


engineering — such as removing carbon 
dioxide from the atmosphere or injecting 
reflective particles into the stratosphere — 
might be required to control global temper- 
atures and preserve vulnerable populations 
and ecosystems’. 

Yet the climate-science community has 
largely avoided the subject. Government- 
funded research has been restricted to > 
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Ship trails (clouds seeded by particles in ship exhaust) off N 


modelling and social-science investiga- 
tions. The few outdoor experiments that 
have tested concepts were either funded 
privately or performed as pure climate 
science without making the climate engi- 
neering intent clear. Such experiments 
fail to ensure two fundamental principles 
of good governance of climate-engineer- 
ing research: transparency and that the 
research is for the public good. 


SOLID UNDERSTANDING 

We believe that this laissez-faire approach 
is risky and imprudent. As the conse- 
quences of climate change become starker, 
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orth America on 20 January 2013. 


public calls for interventions may grow. 
Governments or companies may try cli- 
mate engineering to reduce the severe 
impacts predicted by 2050. Our ignorance 
of the benefits and problems could become 
dangerous. 

Several reports and institutions have 
called’ for climate-engineering research to 
commence. We agree. We must start now: 
gaining a solid understanding of any cli- 
mate-engineering technique will take dec- 
ades. Small-scale outdoor experiments in 
particular are needed to provide real-world 
answers to questions about the efficacy 
and advisability of climate engineering. 
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Even the United Nations Convention on 
Biological Diversity, which has discussed a 
ban on climate engineering, endorses such 
experiments (see go.nature.com/vopjwg). 

But how should research get started? 
Should governance be developed before 
or after early experimentation? Some 
have called for a moratorium on climate- 
engineering research until an interna- 
tional governance regime is in place (see 
go.nature.com/rfx56p). We disagree. A 
ban would push research underground 
and towards private funding where risky 
experiments may proceed ungoverned. 
Or experiments might be conducted with 
no more than usual research governance. 
Neither approach is good. 

We argue that governance and experi- 
mentation must co-evolve. We call on the 
US government and others to begin pro- 
grammes to fund small-scale, low-risk 
outdoor climate-engineering research and 
develop a framework for governing it. 


START SMALL 

Although they do not require approval by 
an international body, small-scale experi- 
ments are an opportunity for international 
collaboration. Countries that have worked 
together on small-scale research and par- 
ticipated in developing governance models 
will be in a better position to agree how 
to handle risky research should that time 
ever come. 

Opponents of climate-engineering 
research have claimed that the only useful 
outdoor research requires perturbing the 
climate. That is wrong. Many small-scale 
experiments would have no measurable 
effect on Earth’s climate**. The physical 
and chemical processes on which interven- 
tions rely need testing and quantification 
at small scales before any climate impacts 
are assessed. Experiments that extend 
up to kilometres in altitude and last days 
to weeks would leave the global climate 
unchanged but would increase scientific 
understanding substantially. 

Some useful low-risk experiments have 
already been identified’. Injecting a small 
amount of sulfur into the stratosphere over 
several weeks would show how fine par- 
ticles evolve and affect ozone depletion; 
spraying salt particles into coastal clouds 
would assess whether cloud reflectivity 
can be increased; seeding high-latitude cir- 
rus clouds with artificial ice nuclei would 
determine whether the clouds can be dissi- 
pated and allow more long-wave radiation 
to escape from Earth. 

These small-scale tests look a lot like 
climate-science experiments, and cli- 
mate-change science will also benefit from 
them. Making the intent of the research 
clear allows governance strategies to be 
explored. All proposals should address 
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TESTING CLIMATE ENGINEERING 


Value. Climate-engineering experiments 
should have social as well as scientific 
benefits, for example by reducing major 
climate-change uncertainties such as the 


roles of clouds and aerosols in moderating 


Earth’s energy balance. The research 
should generate new understanding of 
the risks, effectiveness and advisability of 
climate engineering. 


Risk. Researchers should evaluate and 
minimize their proposal’s downsides — 
known, predicted or perceived. Small, 
short-lived projects raise fewer concerns 
than large and long ones. Avoid concepts 
that would permanently alter the 
environment. Comparing impacts with 
those of other common activities, such as 


flying aircraft in the stratosphere, maintains 


perspective. 


five governance considerations: value, risk, 
transparency, vested interests and legal 
requirements (see ‘Checklist for funding 
research). 

Learning about governance does not 
follow automatically. The SPICE (Strato- 
spheric Particle Injection for Climate 
Engineering) experiment proposed for 
the United Kingdom in 2010 is an exam- 
ple of how not to proceed. This project, 
which would have simply sprayed water 
from a hose attached to a tethered balloon, 
was abandoned after it failed to win pub- 
lic support and when conflict-of-interest 
issues emerged over a patent application 
for the system. It aimed to test a mecha- 
nism by which climate-altering chemicals 
could be introduced into the atmosphere 
to reflect sunlight — but before scientific 
uncertainties about the effectiveness and 
advisability of any such interventions had 
been resolved. Furthermore, little stood to 
be learned from the experiment, because 
the hose would have operated at a lower 
altitude than required for climate engineer- 
ing. The project became a lightning rod for 
public concern and was cancelled. 

Government agencies and scientists 
should begin climate-engineering research, 
learn to govern it and prepare for interna- 
tional collaboration. We recommend the 
following first steps (developed through 
discussions at the 2014 Solar Radiation 
Management Governance Initiative work- 
shop in San Francisco, California). 


FIVE STEPS 
First, pick a good test case for an outdoor 
research project. This will establish a track 


Checklist for funding research 


Transparency. To maintain trust and 
ensure that society can learn how to govern 
climate-engineering research, scientists 
should conduct experiments openly, 
facilitate deliberation and oversight, and 
inform decision-making. Researchers 
should clearly explain to the public an 
experiment’s scientific context, its intent, 
method, alternatives, the expected and actual 
outcomes, and how research questions 
evolve as a result. 


Vested interests. Financial interests and 
intellectual-property rights may influence 
research or lead to political pressure 

that does not serve the public interest. 
Researchers and institutions could have 
positive biases about their climate- 
engineering concepts for professional, 
intellectual or personal reasons°. 


record for dealing with controversy, scru- 
tiny and outreach. The initial experiment 
should yield valuable scientific insight and 
be defensible, in that it is brief and poses no 
significant risk. 

Second, clearly identify the research as 
climate engineering. Obfuscation will vio- 
late public trust and obviate co-evolution 
of governance. 

Third, seek broad advice early to identify 
potential social risks and societal benefits. 

Such understand- 


“As the ing will help when 
consequences of deciding whether 
climate change to stop or pro- 
becomestarker,  ©¢e4- Think of it 
public calls for as a rehearsal for 
interventions constructing an 
may grow.” advisory body, 


should the gov- 
ernment decide to 
establish a strategic research programme. 

Fourth, discuss climate engineering 
within the broader context of climate- 
change strategy. Climate engineering can- 
not substitute for mitigation or adaptation, 
but it might (or might not) provide cru- 
cial tools in a holistic and strategic plan 
for dealing with the inevitable impacts of 
global change. 

And fifth, assess the early work and 
decide whether and how to proceed. What 
was learned? Do the results render any 
subsequent approaches untenable or indi- 
cate that a modification would be more 
effective or more advisable? What new 
scientific issues are identified? What are 
the next steps? If public concerns are 
raised, how can engagement be more 


Governance methods beyond normal peer 
review are needed to check that conflicts of 
interest do not bias evaluation. For example, 
a second team could be asked to confirm or 
find errors in research done by another. 


Legal considerations. Larger-scale research 
may require environmental regulatory review. 
For example, the United States may demand 
an environmental impact assessment or 
statement under the National Environmental 
Policy Act, Clean Air Act or Clean Water Act. 
Experiments that cross national borders 
must abide by customary international 

law or United Nations treaties such as the 
Framework Convention on Climate Change, 
the Convention on Biological Diversity, or the 
Convention on the Law of the Sea. If there 

is foreseeable harm, consent among the 
affected parties should be determined. 


effective and useful? 


Government agencies must take these 
steps. To ensure transparency and public 
trust, outdoor experiments in climate- 
engineering should be publicly, rather than 
privately, funded. 

We urge researchers to come forward 
with well-crafted proposals that meet the 
test-case requirements. Global collabora- 
tors should be engaged as a precursor to 
more formal international cooperation. = 
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Physicist Bruno Pontecorvo (left) defected to the Soviet Union in 1950. 


NUCLEAR PHYSICS 


New light on a cold 
war conundrum 


Sharon Weinberger ponders a chronicle claiming that 
fresh evidence has cracked the ‘Pontecorvo affair’. 


hen the gifted Italian nuclear 
physicist Bruno Pontecorvo van- 
ished with his family in the late 


summer of 1950, no one took any notice. 
It was more than two weeks before friends, 
family and, most importantly, the intel- 
ligence agencies of the United States and 
Britain — both countries where Pontecorvo 
had worked — realized that he had gone. 
And it was another five years before it was 
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confirmed that Pontecorvo had defected to 
the Soviet Union. 

The defection is one of the last great 
unsolved cold-war science mysteries. Did 
Pontecorvo, apparently on his way to win- 
ning a Nobel prize, flee because he was a 
devoted communist facing a hostile politi- 
cal climate? Or was he a spy running to his 
handlers? Physicist Frank Close believes that 
he has the answer. In Half-Life, Close explores 
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what is now knownas the Pontecorvo affair. 
Woven into that chapter in the physicist’s life 
are two love stories: his Swedish wife's love for 
him, and his love for communism. Neither 
ended well. 

Born in 1913 into a successful Jewish family 
near Pisa, Pontecorvo pursued a career in 
physics, first as one of Enrico Fermi’s “Via 
Panisperna’ boys, and then leaving for Paris 
in 1936 to escape the rise of fascism and 
anti-Semitism. In France, he worked in the 
lab of Iréne and Frédéric Joliot-Curie, and 
was swept up in the communist movement. 
Here he met and married his wife, Marianne 
Nordblom. As the war made its way to France, 
Pontecorvo and Marianne — nowwitha child 
— left for Tulsa, Oklahoma, where he used his 
expertise in looking at how neutrons interact 
with materials to create a new method for oil 
prospecting. But Pontecorvo’ politics drew 
the attention of the FBI. 

The federal investigators were not crack 
spy hunters. FBI agents visited Pontecorvos 
Tulsa home in 1942 — when the United States 
was at war with Italy — and found commu- 
nist literature in plain view. The couple's sec- 
ond son was named Tito, after the leader of 
communist Yugoslavia. Short of flying the 
hammer and sickle, it is hard to see how Pon- 
tecorvo could have been more open about his 
leanings. Yet the FBI merely wrote a report, 
which languished for several years. In the 
meantime, Pontecorvo moved to Canada to 
work on the Manhattan Project at the Chalk 
River Laboratories, and in 1949 left for the 
United Kingdom to work for the Atomic 
Energy Research Establishment at Harwell. 

All seemed well until midway through the 
fateful summer holiday in 1950. The family, 
by then with three children, suddenly made 
a dash for Sweden, then Finland; here, Soviet 
agents secreted them across the border, with 
Pontecorvo hiding in the boot of a car. That 
marked the beginning of what Close calls the 
physicist’s “half-life”: the end of his Western 
career and the beginning of his life as Soviet 
scientist Bruno Maksimovich Pontecorvo. 

Too many books are féted as reading ‘like 
spy novels; but Close’s work deserves the 
accolade. He makes a good circumstantial 
case for Pontecorvo being a spy. Some of the 

evidence is relatively 
— convincing, such 
as the assertion of a 
former KGB agent 
who claims that Pon- 
tecorvo did spy for 
the Soviets. On the 
weaker side, Close 
V4 cites two pictures of 


; — Pontecorvo in Can- 
slag ada — standing with 
Divided Life of Chalk Riv ati 
Bruno Pontecorvo, : er scientists 
Physicist or Spy — looking away from 
FRANK CLOSE the camera. Close sug- 
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was already going to the US-Canadian bor- 
der to rendezvous with a Soviet agent, and 
did not want his face recognized. Possibly; or 
perhaps he looked away by accident. 

A bigger problem with this spy narrative 
is that it lacks a denouement — the dramatic 
unmasking of a double agent. Close implies 
that he has finally got the goods in an archival 
document: a letter from the British Embassy 
in Washington DC to the director-general of 
the UK intelligence agency MI5, dated 13 July 
1950. It was this, Close writes, that would 

“lead me to solve the 


if 
Too many mystery of Bruno 
books areféted — pontecorvo’s sud- 


as reading ‘like den disappearance”. 
spy novels’, but The letter is proof 


Close’s work that FBI reports were 
deserves the finally making their 
accolade.” way to Europe. Close 


argues that British 
double agent Kim Philby, who had access to it, 
tipped off Moscow; the Soviets then warned 
Pontecorvo that he was about to be exposed. 
as a spy and arranged for his retreat. 

Close’s conclusion is in stark contrast to 
that of historian Simone Turchetti’s The Pon- 
tecorvo Affair (University of Chicago Press, 
2012). That book concluded that the Italian 
physicist was simply a committed commu- 
nist, whose flight was triggered by a US patent 
lawsuit that he feared would lead to political 
persecution. Close’s version is more plausible, 
but both lack definitive proof. 

Close is at his best when describing Ponte- 
corvos work in neutrinos and neutron detec- 
tion, demonstrating how groundbreaking it 
was, in spite of later attempts by Western 
governments to downplay his importance. 
But Close also occasionally takes liberties for 
the sake of drama. He describes an interroga- 
tion room painted in grey and “mustard”, but 
his citation notes that he bases this scene on 
his knowledge of Soviet offices, not source 
materials. Almost nothing is known of Pon- 
tecorvo’s dealings with the Soviet govern- 
ment. It is one thing to evoke colour, quite 
another to paint with the entire palette. 

Half-Life reveals the real victim as Mari- 
anne, whose already shaky mental health 
deteriorated precipitously in the Soviet 
Union. Cut off from family and friends, 
she spiralled into depression and repeat- 
edly entered psychiatric institutions, while 
Pontecorvo took on a mistress. Pontecorvo’s 
love affair with communism did not end any 
better. He conceded in 1992, after the fall of 
the Soviet Union, that his dreams of a com- 
munist utopia were an illusion. “I was,” he 
told a reporter, “a cretin.” m 
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Books in brief 


—— The Experimenters: Chance and Design at Black Mountain College 
Eva Diaz UNIVERSITY OF CHICAGO PRESS (2015) 

What links systems theorist and architect R. Buckminster Fuller with 
artistic innovators such as Josef Albers and John Cage? The answer 

is Black Mountain College, North Carolina. From 1933 to 1957, in 

this unaccredited institution in Appalachia, they and other “artist- 
scientists” created an iconic lab for experimental research in the 

arts. As art historian Eva Diaz reveals in this engrossing study, their 
explorations in materials, form, chance and indeterminacy were never 
less than electrifying. Her sympathetic portrait of Fuller as a utopian 
saving the world through geodesic geometry is particularly assured. 


Four Revolutions in the Earth Sciences: From Heresy to Truth 
James Lawrence Powell COLUMBIA UNIVERSITY PRESS (2014) 

Deep time, continental drift, meteorite impact and climate change: 
each of these twentieth-century geoscientific discoveries was once 
viewed as heretical. So reminds geologist James Powell in this 
exemplary treatise on scientific progress. He traces the evolution of 
each landmark finding through the work of the dogged researchers 
who proved it, step by step. Many fights were hard-won, as shown 
in the efforts of Gene Shoemaker, Luis and Walter Alvarez, Robin 
Canup and others who established connections between meteorite 
impact, the birth of the Moon and the extinction of the dinosaurs. 


Finding Zero: A Mathematician’s Odyssey to Uncover the 
Origins of Numbers 

Amir D. Aczel PALGRAVE MACMILLAN (2015) 

Mathematician Amir Aczel was obsessed from childhood with 

the origins of numerals. This bracing mathematical detective 
story reveals how he cracked the puzzle: by homing in on zero. 
Close readings of classical texts convinced him that this subtle 
concept emanated from the East. He treks through the findings of 
archaeological scholar George Coedés, the surprising nexus of sex 
and mathematics, and much of southeast Asia before hitting pay 
dirt with a seventh-century artefact in a dusty Cambodian shed. 


A Scientist in Wonderland: A Memoir of Searching for Truth 

and Finding Trouble 

Edzard Ernst IMPRINT ACADEMIC (2015) 

During his 1993-2011 tenure as the world’s first chair in 
complementary medicine (at the University of Exeter, UK), Edzard 
Ernst scrutinized alternative medical treatments, turning up false 
claims and sparking a furore among enthusiasts. As he shows in this 
ferociously frank autobiography, his early career was as dramatic — 
during a stint as chair of rehabilitation medicine at the University of 
Vienna, he uncovered the institution’s historical involvement in medical 
experiments under the Third Reich. A clarion call for medical ethics. 


Pioneers of Neurobiology: My Brilliant Eccentric Heroes 

John G. Nicholls SINAUER ASSOCIATES (2015) 

This scientific memoir by neurobiologist John Nicholls takes the 
form of short biographical sketches focusing on the eccentricities 
of key people he has worked with or encountered, from Nobel 
prizewinners to lab technicians. It is quite a list, including Stephen 
Kuffler, Bernard Katz, Rita Levi-Montalcini, Gunther Stent and 
James Watson. Nicholls’s gently amusing anecdotes shed light on 
the developing environment of molecular neurobiology, mostly in 
Europe and the United States, since the 1950s. Barbara Kiser 
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More than 70 ways 
to show resilience 


Ruben Dahm and his colleagues 
call for greater “flood resilience” 
in delta cities (Nature 516, 329; 
2014). But achieving resilience 
depends on what we mean by it: 
there are more than 70 definitions 
in the scientific literature. 

These definitions vary between 
two extremes, with most trying to 
achieve a balance between the two 
(M. de Bruijne et al. in Designing 
Resilience 13-32; Univ. Pittsburgh 
Press, 2010). Dahm et al. 
implicitly define resilience as the 
ability of a system to bounce back 
after stress, as do many politicians 
and the World Economic Forum 
in Geneva, Switzerland. 

At the other extreme, 
resilience is seen as “the capacity 
of social-ecological systems to 
adapt or transform in response 
to unfamiliar, unexpected and 
extreme shocks” as proposed 
by ecologist Stephen Carpenter 
and economics Nobel laureate 
Kenneth Arrow, among others 
(S.R. Carpenter et al. Sustainability 
4, 3248-3259; 2012). 

To recover, to adapt or to 
invest in both possibilities? All 
make sense in the right context. 
We might want a city to recover 
from flooding, for example, or 
the world to adapt to the effects 
of climate change. 

Long-term policies to promote 
either recovery or adaptation, or 
prepare for both, are likely to be 
very different. We must have a 
clear definition from the outset. 
Len Fisher University of Bristol, UK. 
len. fisher@bristol.ac.uk 


Labs leak staff under 
French law 


Ironically, a well-intentioned law 
enacted in France almost three 
years ago to protect the poorly 
qualified is preventing young 
researchers from completing their 
postdoctoral training. In my view, 
research institutions should be 
exempted from its requirements. 
Called the Sauvadet law, it 
stipulates that a worker must 


be appointed to a permanent 
position after six years of short- 
term contracts in the public 
sector. Since the law came 

into effect in March 2012 (and 
retrospectively), publicly funded 
research institutions — including 
INSERM and the CNRS — have 
limited the number of postdocs 
becoming eligible for tenure 

by not renewing contracts ifa 
postdoc has already worked there 
for three years (see go.nature. 
com/vki6fq and go.nature.com/ 
navteh; both in French). 

Young researchers are therefore 
being forced to complete their 
training abroad. The law also 
means that invaluable laboratory 
engineers and senior technicians 
can no longer be retained 
under short-term contracts if 
a permanent position is not 
available. The nation urgently 
needs to put countermeasures in 
place, or risk losing crucial lab 
staff indefinitely. 

Juan Iovanna INSERM UMR1068, 
CNRS UMR7258, Aix-Marseille 
University and Institute Paoli- 
Calmettes, Marseille, France. 
juan.iovanna@inserm.fr 


Replace ‘pathogens’ 
with ‘perceptogens’ 


We argue for a more 
sweeping reappraisal of the 
term pathogen than Arturo 
Casadevall and Liise-anne 
Pirofski propose (Nature 516, 
165-166; 2014). This should 
take in not just microbes, but 
the wider ‘exposome and recent 
discoveries in infection and 
immunity research. 

A term is needed that 
encompasses sequences 
from the environment — 
intrinsic or extrinsic — that 
impart pathogenic or benign 
information to eukaryotic 
immune receptors. For example, 
T-cell receptors that recognize 
autoantigen and microbial 
sequences can be triggered 
by related peptide sequences 
from diverse sources in the 
environment (M. E. Birnbaum 
et al. Cell 157, 1073-1087; 2014). 


Substituting ‘microbial 
immunogen for ‘pathogen’ 
would not account for microbiota 
sequences that instruct immune 
development rather than elicit 
protective immunity. We suggest 
instead the term perceptogen 
(microbial or environmental) 
to cover protein sequences 
that affect the body’s range of 
reactions after perception by its 
immune receptors. 

As the writer Aldous Huxley 
remarked: “There are things 
known and there are things 
unknown and in between are the 
doors of perception.” 

Danny Altmann, Rosemary 
Boyton Imperial College London, 
UK. 

d.altmann@imperial.ac.uk 


Learning from 
Typhoon Haiyan 


In our view, the communication 
of disaster risk during Typhoon 
Haiyan, which struck the 
Philippines in November 2013, 
could have been better. 

The typhoon was one of the 
strongest tropical storms ever 
to make landfall, registering 
category 5 on the Saffir-Simpson 
scale. Despite forecasts of winds 
of more than 300 kilometres per 
hour and a predicted 7-metre 
storm surge, the city of Tacloban 
was caught underprepared: 
thousands died from the 
inundation. 

The storm surge was predicted 
ina report by the Philippine 
Atmospheric, Geophysical 
and Astronomical Services 
Administration (PAGASA) that 
was sent to local agencies and 
communities. Unfortunately, 
it was simply a line at the end 
ofa routine weather bulletin. It 
was apparently not otherwise 
highlighted, elaborated on or, in 
our opinion, in any way explained 
in order to transmit its urgency to 
key agencies and the public. 

After interviewing agency 
personnel, we concluded that 
a well-intended adherence 
to routine and pro forma 
communication could have 


been at play. Feedback loops for 
conveying tacit information (for 
example, the implications of 
modelling outputs) seem to have 
been inadequate. 

PAGASAs Tacloban team 
stayed in its single-storey coastal 
office, which was demolished by 
the storm surge, claiming a team 
member's life. 

Many other factors influenced 
the impact of Haiyan, but this 
example indicates that routines 
need to adapt to deal with extreme 
events that lie beyond personal 
and institutional memory. 

Raul Lejano New York 
University, New York, USA. 
Joyce Melcar Tan, Meriwether 
Wilson University of Edinburgh, 
UK. 

lejano@nyu.edu 


Build neuroscience 
capacity in Africa 


The non-profit organization 
TReND is funding a 
neuroscience training initiative 
so that A frica’s scientists can join 
this rapidly evolving research 
field (see www.trendinafrica.org). 

TReND (for “Teaching and 
Research in Neuroscience 
for Development’) is run by 
volunteer researchers at several 
universities worldwide. It 
organizes outreach courses and 
workshops for young African 
scientists on how to conduct 
quality, affordable neuroscience 
research in resource-limited 
settings. In 2010-14, more than 
1,000 African students took part 
in TReND programmes. 

The organization provides 
students with used lab equipment 
from universities, hospitals 
and companies in developed 
countries, as well as open-source 
software and hardware. 

We call on industry and 
governments for more investment 
in such activities to improve 
science education and promote 
economic development in Africa. 
Fanuel Muindi, Joseph Keller 
Massachusetts Institute of 
Technology, Cambridge, USA. 
fmuindi@mit.edu 
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OBITUARY 


Mary F. Lyon 


Nature that one of the two X chromo- 

somes in every cell of female mammals 
is inactivated. This, she argued, occurs to 
prevent XX female cells from expressing 
twice as many X-linked gene products as 
XY male cells. 

Lyon’s X-chromosome inactivation 
hypothesis had profound implications for 
clinical genetics and developmental biol- 
ogy. For instance, it helped researchers to 
elucidate the genetic basis of many X-linked 
diseases, such as Duchenne muscular 
dystrophy. It also led, 30 years later, to the 
discovery of the Xist gene, which helped to 
spawn a whole field of research on the role 
of long non-coding RNA molecules in regu- 
lating gene expression. The non-coding Xist 
RNA is the master switch that turns off the X 
chromosome. It also explains some everyday 
phenomena, such as the patchy colouring of 
tortoiseshell cats. 

Lyon died aged 89, on Christmas Day 
2014, after drinking a glass of sherry, eating 
her Christmas lunch and settling down in 
her favourite chair for a nap. She was born 
in Norwich, UK, in 1925, the eldest of three 
children of a teacher mother and civil- 
servant father. An inspirational teacher at 
her grammar school sparked her interest in 
biology and in 1943 — when women were 
awarded only ‘titular’ degrees — she went to 
Girton College, University of Cambridge, to 
study zoology. 

In 1946, Lyon started her PhD at 
Cambridge with the geneticist R. A. Fisher 
in the emerging field of mouse genetics. In 
1948, in pursuit of better histology facilities 
to complete her doctoral studies, Lyon moved 
to the Institute of Animal Genetics at the Uni- 
versity of Edinburgh, at the time headed by 
the embryologist C. H. Waddington, another 
figure whose work profoundly influenced her. 

Lyon stayed in Edinburgh to work on a pro- 
ject funded by the Medical Research Council 
(MRC) to study mutagenesis in mice under 
the geneticist Toby Carter: in the 1940s there 
was widespread concern about the possibil- 
ity of atomic-weapons testing causing muta- 
tions. In 1955, Carter transferred his group 
to the MRC Radiobiology Unit at Harwell, 
UK, where there was more space for mouse 
breeding. It was during this period that Lyon 
observed the patchy coats of female mice car- 
rying X-linked coat-colour mutations. This, 
coupled with the knowledge that female 
mice carrying a single X chromosome are 


I n 1961, Mary Frances Lyon proposed in 


(1925-2014) 


Grande dame of mouse genetics. 


viable, led her to propose the hypothesis of 
X-chromosome inactivation. 

Apart from a short sabbatical in 
Cambridge, Lyon remained at Harwell for the 
rest of her career. From 1962 she was head of 
the genetics division, which became an inde- 
pendent Mammalian Genetics Unit in 1995. 

Despite some initial tussles with the MRC 
over the amount of time and resources to 
devote to ‘ancillary projects’ in developmen- 
tal genetics — rather than to establishing the 
hazards of radiation and other mutagenic 
agents — Lyon managed to pursue both. 

Throughout her six decades of work on 
mice, her favourite chromosome, aside from 
the X, was 17. Chromosome 17 encodes 
the t-complex, a genetic anomaly found in 
certain wild mice that gives rise to differ- 
ent ‘t-haplotypes, which consist of different 
DNA rearrangements. Certain t-haplotypes 
are preferentially transmitted by males to 
their offspring; mice carrying two copies of 
the same t-haplotype are either not viable 
or sterile. By carrying out a series of clever 
genetic crosses, Lyon worked out what was 
going on. This work made a major contri- 
bution to the understanding of phenomena 
such as non-Mendelian inheritance (the 
abnormal segregation of chromosome pairs 
from the expected one-to-one ratio) and 
the effect that inversions — when a segment 
of a chromosome is reversed — have on 
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suppressing chromosomal recombination. 

Lyon was a central figure in twentieth- 
century mouse genetics. She laid the intellec- 
tual foundations and developed the genetic 
tools for the use of mice as model organisms 
in molecular medicine, cell and developmen- 
tal biology and in deciphering the function 
of the human genome. Lyon was editor of 
Mouse News Letter from 1956 to 1970, a pub- 
lication that had a key role in establishing a 
mouse-focused research community in the 
pre-Internet age. She also helped to develop 
a common language for the field by chair- 
ing the Committee on Standardised Genetic 
Nomenclature for Mice from 1975 to 1990. 
Her pivotal contribution was recognized 
by the naming of the Mary Lyon Centre, 
an international facility for mouse-genetic 
resources, opened at Harwell in 2004, and by 
the creation of the Mary Lyon Medal by the 
UK Genetics Society in 2014. 

Because everything Mary said was so care- 
fully thought through, she could be difficult 
to talk to: on the phone, it was easy to think 
you had been cut off. She did not suffer fools 
gladly, but was a great supporter of the bright 
young scientist, often eschewing authorship 
of publications to enhance the profile of jun- 
ior collaborators. She was intellectually rigor- 
ous but not dictatorial. When I began my PhD 
with her in 1977, she gave me a handful of 
papers, showed me the genetic tools — mice 
carrying the various mutations and chro- 
mosomal rearrangements — and said, “do 
something on X-inactivation” That degree of 
academic freedom was exhilarating, coupled 
as it was with the safety net of robust critique. 

Among numerous other honours, Mary 
was a foreign associate of the US National 
Academy of Sciences and was the 28th 
eighth woman to be elected a fellow of the 
Royal Society in London in 1973. She might 
have been elected sooner had leading geneti- 
cist Hans Griineberg not initially disbelieved 
the Lyon hypothesis. It is perhaps surprising 
that Mary did not receive any establishment 
honours, but bureaucracy, politics and net- 
working were alien to her. 

Her first love was mice, although she always 
had a cat — a tortoiseshell, of course. m 


Sohaila Rastan is executive director of 
biomedical research at Action on Hearing 
Loss in London, UK. She was Mary 
Lyon's second PhD student at the MRC 
Radiobiology Unit in Harwell, UK. 
e-mail: sohaila@rastan.fsnet.co.uk 
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CHEMICAL BIOLOGY 


How to minimalize antibodies 


The success of antibodies as pharmaceuticals has triggered interest in crafting much smaller mimics. A crucial step forward 
has been taken with the chemical synthesis of small molecules that recruit immune cells to attack cancer cells. 


CHRISTOPH RADER 


ver the past two decades, a wave 
(): biological medical products, or 

biologics, dubbed monoclonal anti- 
bodies (mAbs) has conquered the pharma- 
ceutical armamentarium. There are more 
than 30 mAbs marketed for treating cancer, 
autoimmune diseases and other serious med- 
ical conditions, and a similar number are in 
late-stage clinical trials’. At least five mAbs 
have each garnered more than US$5 billion in 
annual revenue. Small molecules that mimic 
the pharmacological properties of mAbs 
therefore have the potential to become highly 
competitive drugs. Writing in the Journal of the 
American Chemical Society, McEnaney et al. 
provide evidence that this is an achievable goal. 

The success of mAbs as pharmaceuticals 
is remarkable, given their size, composition 
and heterogeneity (mAbs are populations of 
similar, but not identical, molecules). They are 
more than 100 times larger than conventional 
small-molecule drugs manufactured using 
chemical synthesis, and therefore require 
more-expensive and less-precise biological 
synthesis. 

Tripartite Y-shaped antibodies evolved 
as a cornerstone of the vertebrate immune 
system. They can selectively and tightly bind to 
foreign molecules with their two Fab regions 
(targeting functions) and recruit compo- 
nents of the host immune system with their 
Fc region (effector functions). Furthermore, 
the Fc region mediates recycling of the anti- 
body molecule, resulting in its retention in the 
blood (giving it a prolonged circulatory half- 
life). In the treatment of cancer, mAbs use their 
tripartite architecture to bring immune cells 
readied for the kill in close proximity to cancer 
cells (Fig. 1a). 

These three principal features of antibody 
molecules bestow pharmacological proper- 
ties on mAbs that are unmatched by small 
molecules. But the complexity of mAbs has 
prevented generic versions of branded drugs 
from being produced by competitor compa- 
nies, and slowed the production of similar 
versions — providing a strong investment 
incentive for pharmaceutical companies, but 
potentially driving up health-care costs. By 
contrast, small molecules that mimic mAbs 
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Figure 1 | Bridging target and effector cells. a, Tripartite antibody molecules bridge target and effector 
cells by simultaneously engaging two structures (antigens) on the target-cell surface with their Fab 
regions and one effector-cell surface receptor with their Fc region. b, Small molecules that mimic Fab 
regions can be docked to whole antibodies (not shown) or Fc regions (depicted) to produce molecules 
that have the pharmacological properties of antibodies. c, McEnaney et al.” now report a quadripartite 
synthetic antibody that is 20 times smaller than naturally occurring antibody molecules and readily 
generated by chemical synthesis. Cartoon versions of the antibody and the small molecules are 
approximations of the actual molecular structures and not drawn to scale. 


would have lower manufacturing costs and 
enable competition from generics. And unlike 
mAbs, which often trigger immune responses 
in patients, small molecules are not immuno- 
genic. Moreover, they can penetrate tissues 
and cells more efficiently, can reach buried 
sites on target molecules that are inaccessible 
to mAbs, can be given orally and have a longer 
shelf life. 

The discovery and development of peptides, 
peptidomimetics and other small molecules 
that have a specificity and affinity for biologi- 
cal targets comparable to those of mAbs have 
been key in efforts to replace mAbs by small 
molecules. In order for them also to have the 
effector functions and prolonged circulatory 
half-life of mAbs, these small molecules were 
designed to dock to antibodies either in vitro, 
yielding chemically programmed antibod- 
ies’, or in vivo, producing antibody-recruiting 
molecules*. Although the resulting hybrid 
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molecules (Fig. 1b) combine several of the 
advantages of mAbs and small molecules, their 
biological component still restricts the scope of 
their chemical component. 

A provocative question has therefore been 
whether small molecules can copy both the 
Fab- and Fc-mediated pharmacological prop- 
erties of mAbs. This is theoretically possible if 
Fab-mimicking small molecules can be fused 
to other small molecules that bind to Fc recep- 
tors. McEnaney et al. now deliver this missing 
link. They used chemical synthesis to combine 
a known Fab-mimicking small molecule that 
binds to a cell-surface receptor on prostate- 
cancer cells with a known Fc-mimicking cyclic 
peptide that selectively binds to an Fc receptor 
called CD64 on immune cells. The resulting 
compound mimics two of the three principal 
natural features of antibodies. 

Bridging two cell-surface receptors is a 
formidable task for a small molecule. It 


requires a linker between the Fab- and 
Fc-mimicking components that has sufficient 
length, solubility and rigidity. The authors 
used computer modelling to predict that 
less than one-third of the naturally occur- 
ring distance between Fab and Fc regions in 
an antibody is required to simultaneously 
engage receptors on two different cells, and 
they used this information to design their 
linker. They also found that two copies of 
each Fab- and Fc-mimicking component are 
essential for efficiently mediating targeting 
and effector functions in vitro. The result is a 
quadripartite molecule (Fig. 1c) that resem- 
bles tripartite antibodies with respect to com- 
position and function, but which is 20 times 
smaller, homogeneous (all the molecules are 
the same) and readily generated by chemical 
synthesis. Encouragingly, McEnaney et al. 
observed that this compound induces immune 
cells to engulf and ingest prostate-cancer 
cells in vitro. 

Although in vivo validation studies have 
still to be performed, it seems that McEnaney 
et al. have taken a pivotal step towards obtain- 
ing antibody-mimicking small molecules 
that avoid some of the liabilities of biologics. 
But there is more work to be done. First, the 
quadripartite molecule is about 7 kilodaltons 
in size — substantially smaller than antibod- 
ies, but still larger than conventional small- 
molecule drugs (less than 1 kDa), which limits 
most of the potential advantages discussed 
earlier. However, the molecular weight can 
conceivably be cut further by replacing the 
relatively large Fc-mimicking cyclic peptide 
with a peptidomimetic or other small mol- 
ecule. Then again, an intermediate-sized syn- 
thetic antibody mimic might be a good thing, 
because unrestrained access of small molecules 
to intra- and extracellular nooks and crannies 
could make their activity and toxicity profiles 
unpredictable. 

Second, the Fc-mimicking component of the 
synthetic antibody mimic binds to only one 
kind of Fc receptor (CD64), whereas natural 
antibodies and mAbs engage other Fc recep- 
tors, including CD16 and CD32, ona variety of 
functionally different effector cells. Moreover, 
the Fc region of antibodies triggers activation 
of the complement cascade, which is an addi- 
tional mechanism of target-cell destruction, 
and it also mediates prolonged circulatory 
half-life — effects that generally have been dif- 
ficult to produce with small molecules. Even 
so, synthetic antibody mimics that engage only 
one kind of Fc receptor might allow effector 
functions to be fine-tuned. The modular, ver- 
satile design of McEnaney and co-workers’ 
molecules will also allow their properties to be 
tailored through chemical synthesis, for exam- 
ple to include a peptide or peptidomimetic that 
is retained in the blood by binding to circulat- 
ing albumin protein’. 

Third, Fab-mimicking small molecules are 
still limited in scope when compared with 


mAbs, which can be generated and evolved to 
bind to almost any cell-surface receptor selec- 
tively and tightly’. Nonetheless, our ability to 
generate and screen large chemical libraries, 
which are structurally much more diverse than 
biological libraries, has afforded access to an 
increasing number 
of small molecules 
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ponents of biolog- 
ics are not sitting 
idle. Antibody engineers have generated a 
large variety of antibody molecules that have 
improved targeting and effector functions. 
For example, a new class of ‘bispecific’ anti- 
body’ can recruit and activate T cells, which 
are particularly potent effector cells that can- 
not be directly engaged by natural antibod- 
ies and mAbs. Although not as miniaturized 
as synthetic antibody mimics, these bispe- 
cific antibodies are three times smaller than 
mAbs and can be clinically potent, safe and 
profitable, as demonstrated by the recently 
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marketed anticancer drug blinatumomab’. 
Intriguingly, however, synthetic antibody 
mimics might be better copies of these T-cell- 
engaging biologics than of conventional 
mAbs, because the biologics bind to just one 
kind of effector-cell receptor (CD3) and do 
not need prolonged circulatory half-lives. All 
things considered, synthetic antibody mimics 
have the potential to become a new class of 
pharmaceutical. m 
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The slippery base 
of a tectonic plate 


High-resolution imaging of the base of the Pacific plate as it descends beneath 
New Zealand discloses a 10-kilometre-thick channel that decouples the plate 
from underlying upper mantle. SEE LETTER P.85 


CATHERINE A. RYCHERT 


shell of the Earth, knownas the lithosphere, 

consists of several rigid plates, which move 
relative to each other over the weaker, flowing 
asthenosphere. The bottom of the lithosphere, 
the lithosphere-asthenosphere boundary 
(LAB), is fundamental to our understanding 
of how plate tectonics works, although an exact 
understanding of the mechanism that gives the 
plates their rigidity and defines their thick- 
ness remains elusive and widely debated. On 
page 85 of this issue, Stern et al.' describe how 
they have used reflected seismic waves gener- 
ated by explosive sources in steel-cased bore- 
holes to image the Pacific plate as it descends 
beneath New Zealand. They find a LAB that 
is less than 1 kilometre thick at the top ofa 
10-km-thick channel, in which slow seismic 
velocities may require the presence of water 
or melt (Fig. 1). The authors suggest that the 
thin channel decouples the lithosphere from 


E the theory of plate tectonics, the outer 


by 
© 2015 Macmillan Publishers Limited. All rights reserved 


the asthenosphere and allows plate tectonics 
to take place. The existence of such a local- 
ized channel probably has implications for the 
driving forces of plate tectonics and mantle 
dynamics. 

Plate tectonics has been a fundamental tenet 
of Earth science for almost 50 years. It is the 
foundation of modern Earth science, and pro- 
vides a framework for our understanding of the 
formation of the continents and ocean basins, 
and the evolution of the planet. However, ques- 
tions remain, such as, where is the base of a 
plate and what makes a plate ‘plate-like’? There 
are many proxies used to estimate the depth 
and nature of the base of tectonic plates, but so 
far no consensus has been reached. The transi- 
tion from the rigid lithosphere to the flowing 
asthenosphere has classically been defined 
by temperature. Temperature has a large 
effect on the viscosity of rocks — their ability 
to flow. 

If temperature alone were the sole mecha- 
nism governing the definition of the plate, 
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then we would expect a gradual transition 
from the lithosphere to the asthenosphere. 
However, in the past decade the high resolu- 
tion provided by imaging techniques based 
on measurements of seismic waves scattered 
from the base of the lithosphere has revealed 
that the transition from the lithosphere to the 
asthenosphere is sharp’. This suggests that 
another mechanism such as the presence of 
water or melt must exist in the asthenosphere, 
which would weaken it*’, and thus necessarily 
define the LAB’. 

Stern et al. used data from explosive-source 
seismic waves that travelled deep into the 
Earth and were reflected back to the surface, 
where they were recorded by seismometers in 
New Zealand. The seismic waves vibrated at 
high frequency, allowing the authors to image 
the low-seismic-velocity channel and also to 
deduce the thickness of the LAB. The deduced 
LAB thickness is one of the tightest constraints 
so far on the transition from the lithosphere to 
the asthenosphere. Similarly significant is the 
reported thickness of the 120-million-year-old 
Pacific plate at 73 + 1 km, much thinner than 
predicted by the classic thermal model of 
conductive cooling of the oceanic lithosphere 
with time’. Finally, an increase in seismic-wave 
velocity about 10 km deeper and parallel to the 
LAB is interpreted as the base of a decoupling 
channel. 

If the deduced LAB represents the base of 
the plate, the plate’s thinness may explain the 
enigmatic observed lack of subsidence for sea 
floor that is more than 70 million years old’. 
However, whether or not it is the plate base 
depends on the mechanism responsible for 
the authors’ observations. A seismic-velocity 
discontinuity imaged beneath the Pacific 
plate at similar depths is sometimes inter- 
preted as anisotropic fabric’ "’ — the direc- 
tional dependence of seismic-wave velocity. 
A purely anisotropic interpretation for the 
observed seismic-velocity discontinuity 
would not necessarily equate it with the LAB 
because anisotropic fabric could be frozen into 
the plate from a previous episode of deforma- 
tion. Although not impossible here, an exotic 
anisotropic fabric would probably be required 
that may not be consistent with typical notions 
of horizontal fast directions in the lithosphere 
(see, for example, ref. 12). 

Therefore, the authors are left with the 
hypothesis that water or melt is present in 
the channel, which would weaken the man- 
tle** and define the base of the plate’. An 
increase in hydration’’ with depth could 
be related to the shallow dehydration that 
occurs during plate formation at a mid-ocean 
ridge'* (Fig. 1), whereas melt could be caused 
by complex mantle flow from subduction 
tectonics and/or melt ponding’*. Further 
investigation is needed to find the origin of 
any existing melt, because normal oceanic 
lithosphere is predicted to be cold at a depth 
of 73 km and so is not necessarily predicted 
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Figure 1 | A model of the subducting Pacific plate. Stern et al.' used seismic waves that travelled deep 
into the Earth and were reflected back to the surface to image the Pacific plate as it subducts beneath 
New Zealand. The seismic data reveal a plate that is 73 kilometres thick, significantly less than that 
predicted by conventional conductive-cooling models of the oceanic lithosphere (dashed line). The 
base of the plate, the lithosphere-asthenosphere boundary (LAB), is sharp (less than 1 km thick) and is 
underlain by a 10-km-thick water- or melt-rich channel that decouples the plate from the underlying 
flowing asthenosphere. The extent to which the channel continues beneath the ocean plate and/or varies 
in depth has yet to be determined, as indicated by the question mark. The arrows show the directions of 
motion in the different Earth layers. The plate may be relatively dehydrated, possibly as a result of melting 
beneath the mid-ocean ridge. The depth extent of the melt triangle would determine the thickness of 
the dehydrated layer (dotted line). Melt or water may travel to the channel from greater depths. The 
mechanism by which a 10-km-thick channel might form is unclear. 


to melt. In this case, a steady supply of melt 
from greater depths in the mantle to the base 
of the plate would be required, given that 
the melt might travel up along the base of 
the plate. 

The very existence of the channel itself is 
more ofan enigma. How and why channeliza- 
tion would occur over a 10-km depth range 
is not known. Perhaps water availability from 
phase transformations”® or melt ponding” 
occurs over a certain depth range. It could be 
specific to locations at which plate motions 
deviate from mantle flow, as is the case off both 
New Zealand and Costa Rica”, where a simi- 
lar channel was reported’*. Overall, channels 
offer an explanation for some of the elusive 
nature of the LAB. Narrow channels would be 
nearly imperceptible in seismic imaging meth- 
ods that rely on low-frequency waves, which 
might explain intermittent and discrepant LAB 
detection among methods”. For a full under- 
standing of such channels, we need better con- 
straints on where they exist. 

However, global channel imaging may prove 
difficult. Anisotropy is probably important at 
the LAB, and may bias results if it is not prop- 
erly considered. In addition, high seismic- 
wave frequencies are needed to distinguish 
fine-scale channels, although studies such as 
those of Stern and colleagues are not feasible 
at a global scale. Finally, what are the implica- 
tions of these channels for the coupling of the 
plates to the underlying asthenosphere and 
the driving forces of plate tectonics? Tackling 
these questions will require incorporating 
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tight seismic constraints with laboratory 
experiments and geodynamical modelling. = 
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CELL BIOLOGY 


Organelles under 


light control 


Optogenetic techniques enable light-activated control of protein-protein 
interactions in the cell. This approach has now been used to alter membrane 
dynamics and induce cellular reorganization. SEE LETTER P.111 


FRANCK PEREZ 


he internal organization of cells is 
ik tuned, and their intracellular 

dynamics ever-changing. This is par- 
ticularly apparent in plant, animal and fungal 
cells, which contain specialized membrane- 
bound vesicles and organelles in which par- 
ticular reactions occur. Studies show’ that 
organelles must be correctly positioned 
to ensure proper cellular function, but it 
has so far been difficult or impossible to 
suddenly and reversibly alter the positions of 
vesicles and organelles in cells. In this issue, 
van Bergeijk et al.’ (page 111) describe a tech- 
nique that provides biologists with tools that 
can be thought ofas light-responsive tweezers, 
enabling precise and rapid control of organelle 
positioning and movement in cells. 
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Motor 
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Organization of the cell’s membranes 
depends on the cytoskeleton, and in parti- 
cular on cytoskeletal microtubule and micro- 
filament structures, which are involved 
in intracellular transport. Cells use these 
cytoskeletal networks, along with dedicated 
motor proteins such as kinesins, dyneins and 
myosins” , to distribute and position different 
organelles in distinct subcellular regions’. 
For example, organelles called peroxysomes, 
which break down fatty-acid chains, are often 
distributed in the region around the nucleus. 
Early endosomes involved in cellular uptake of 
various cell-surface proteins are located at the 
cell periphery, whereas endosomes involved 
in molecule recycling are located at the cell 
centre. The mitochondria, which generate 
ATP molecules, are radially distributed along 
microtubules in the cytoplasm. Organelle 


Figure 1| A light touch. The position of subcellular structures such as organelles and vesicles is 
regulated by microtubules that are part of the cell’s cytoskeleton. A system developed by van Bergeijk 
etal. can be used to move organelles to different positions within the cell in response to light. The authors 
fused motor proteins to ePDZb1-protein domains, and organelle-associated proteins to a modified 
LOV-protein domain. The modified LOV domain changes conformation when exposed to blue light, 
unmasking an ePDZ-binding motif. Thus, following illumination, the modified LOV domain binds to 
ePDZb1. The organelle becomes tethered to the motor protein and its cellular position is altered by the 
motor protein’s movement along microtubules — in the case shown here, from around the nucleus to the 


cell’s periphery. 
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positioning can be even more extreme in 
highly polarized cells such as neurons and 
migrating cells, in which specialized functions 
must be sustained in specific parts of the cell, 
for example signalling at synaptic junctions or 
polarized secretion. 

Optogenetic tools can regulate rapid and 
reversible interactions between selected 
protein domains in specific areas of a cell or 
tissue, in response to laser illumination*”’. 
In the current study, van Bergeijk et al. used 
these tools to induce the rapid recruitment of 
molecular motors to specific membranes. Sev- 
eral optogenetic systems currently exist, and 
the authors mostly used the ‘LOV-ePDZbl’ 
set-up®. This system is based on a modified 
light-oxygen-voltage (LOV) protein domain. 
The modified LOV domain changes confor- 
mation following illumination with blue light, 
blocking or unmasking an amino-acid motif 
through which the modified LOV interacts 
with another protein domain, ePDZb1. 

The authors fused the modified LOV 
domain to proteins located on the mem- 
branes of specific organelles — PEX3 for 
peroxysomes, Rab11 for recycling endosomes 
and TOM20 for mitochondria. They fused 
the ePDZb1 domain to a motor protein, 
such as the kinesin KIF1A, which trans- 
ports proteins to the cell periphery, or the 
BICD2 dynein-recruitment domain, which 
pulls proteins towards the cell centre. The 
myosin Vb motor-protein domain was also 
used to move membranes in neurons called 
dendrites. Thus, blue light induced indi- 
rect tethering of an organelle of interest to 
a desired motor protein, leading to move- 
ment of the organelle around the cell (Fig. 1). 
Conversely, van Bergeijk and colleagues tran- 
siently immobilized organelles and vesicles by 
fusing ePDZb1 either to the syntaphilin pro- 
tein, which stably binds to microtubules, or 
to myosin-Vb, which stably binds the cyto- 
skeleton in non-polarized cells. 

Next, the researchers showed that this tech- 
nique could alter organelle position almost 
instantaneously, changing the cell’s organiza- 
tion in a matter of minutes. The reversibility 
of the system meant that it was possible to use 
intermittent cycles of illumination to study 
both organelle movement and the restoration 
of normal positioning. Laser illumination 
could be targeted to spots as small as 250 nano- 
metres wide, as well as to much larger areas, 
meaning that perturbation of organelle posi- 
tioning, or sudden immobilization of transport 
vesicles, could be achieved in specific cellular 
subdomains. 

This system has many applications, as 
van Bergeijk and co-workers demonstrated. 
For example, they used local illumination to 
recruit BICD2 or KIF1A to Rab1 1-associated 
recycling endosomes in neuronal cells. This 
respectively decreased or increased the quan- 
tity of recycling endosomes in the growth 
cone (a structure located at the tip of growing 
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neuronal extensions, and in particular at the 
extremity of projections called axons). It has 
been shown*” that Rab11 endosomes are 
involved in axon growth, but the authors’ 
analysis goes further, showing that the pres- 
ence of recycling endosomes in the growth 
cone is directly correlated with axon extension. 

The authors also used their system to 
test the ‘tug-of-war’ model for positioning 
organelles in neuronal protrusions called 
dendritic spines. This model states that a 
balance between stable tethering and motor- 
driven forces is essential to regulate the posi- 
tioning of organelles. Using their system, the 
researchers confirmed this model and defined 
the precise role of particular motor proteins 
and anchoring factors in polarized organelle 
trafficking. In summary, van Bergeijk and 
colleagues have designed a powerful tool 
that, with its high spatio-temporal resolu- 
tion, is a spectacular example of the ability of 
optogenetics to alter processes in real time in 
chosen subcellular areas. 

Technological revolutions have often 
provided the tools with which to analyse 


cellular processes from a different point of 
view. Examples include the advent of RNA 
interference, super-resolution fluorescent 
imaging, and electron microscopy and its 
subsequent improvements, all of which were 
instrumental in helping cell biologists to 
reimagine the cell. The next challenge is not 
only to improve existing tools, but also to 
develop additional approaches to asking new 
questions in a comprehensive and integrated 
manner. 

Optogenetic strategies, including van 
Bergeijk and co-workers’ technique, will have 
a major part to play here. For example, the 
quantitative spatio-temporal data that these 
techniques can generate will be of great use 
to fields such as systems biology and theoreti- 
cal modelling. The study of cell biology at the 
tissue or whole-organism level will similarly 
benefit from such an approach, because it 
will be possible to suddenly change the posi- 
tions and dynamics of specific organelles in 
particular cell types, and then monitor induced 
defects. Gene editing now allows us to create 
modified versions of key cellular regulatory 


Three-dimensional 
printed electronics 


Can three-dimensional printing enable the mass customization of electronic 
devices? A study that exploits this method to create light-emitting diodes based 
on ‘quantum dots’ provides a step towards this goal. 


JENNIFER A. LEWIS & BOK Y. AHN 


he ability to rapidly print three-dimen- 
sional (3D) electronic devices would 
enable myriad applications, including 
displays, solid-state lighting, wearable elec- 
tronics and biomedical devices with embed- 
ded circuitry. Writing in Nano Letters, Kong 
et al.' report an intriguing route to this goal 
by creating fully 3D-printed light-emitting 
diodes (LEDs) based on quantum dots. Quan- 
tum dots are semiconducting nanocrystals 
that exhibit tunable colour emission” *. Using 
a 3D-printing method based on extruding 
multiple materials, the researchers patterned 
quantum-dot-based LEDs (QD-LEDs) on 
curved surfaces and integrated arrays of the 
diodes in 3D matrices. 
3D printers transform the output files from 
computer-aided design tools into tangible 
objects using pattern-generating devices that 
move along multiple directions in space’. 
These devices can be light sources that harden 
resins or fuse powders, or nozzles that directly 
deposit materials. Since their introduction 


nearly three decades ago, 3D-printing methods 
have been used to build myriad objects, 
primarily prototypes, in a sequential, layer- 
by-layer fashion. 

To create 3D objects of arbitrary form and 
specific function, a broad palette of materials 
and multi-material printing platforms 
are required. One promising approach is 
3D extrusion printing’ , in which functional 
inks are deposited through fine cylindrical 
nozzles under an applied pressure at ambient 
conditions. Unlike 3D printers that use inkjet 
print heads, which are suitable only for inks 
with a narrow range of viscosities (about 
ten times that of pure water), extrusion-based 
printing enables materials of widely varying 
composition and viscosity to be patterned’. 

QD-LEDs are multilayer devices built 
around an active (light-emitting) layer com- 
posed of quantum dots*. Multiple QD-LED 
layer architectures have been explored with 
the aim of optimizing their external quantum 
efficiency, that is, the ratio of the number of 
photons emitted by the QD-LED relative to the 
number of electrons injected into the device 
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factors”. Combining optogenetic development 
with gene editing will enable us to control cell 
organization precisely and to question its role 
in cellular function. A bright future awaits cell 
biology. = 
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when an electric field is applied between the 
device's outer metallic electrodes (cathode 
and anode layers). In the typical embodiment 
of a QD-LED, the active layer is sandwiched 
between layers of electron- and hole-trans- 
porting materials, where holes are positively 
charged carriers. The applied electric field 
causes electrons and holes to move into the 
active layer, where they recombine to emit 
photons. 

Solution-based processing routes have 
recently emerged for patterning QD-LEDs 
with the aim of lowering fabrication costs”*. 
Central to this approach is the ability to main- 
tain highly uniform layers between dissimilar 
materials. To create their QD-LED devices, 
Kong and colleagues sequentially printed sev- 
eral materials (Fig. 1). First, a conductive silver 
ring that surrounds a transparent anode layer 
followed by a hole-transport layer were printed, 
and annealed at 200°C (silver ring) and 150°C 
(other layers). Next, the active layer was 
formed by printing quantum dots suspended 
in a solvent mixture in a drop-wise manner. 
As solvent evaporation ensued, recirculating 
fluid flow suppressed quantum-dot migra- 
tion to the drop edge, yielding relatively 
uniform, active layers’. Notably, each layer 
was patterned using immiscible solvents 
to minimize interlayer mixing. Finally, a 
cathode layer composed of liquid metal was 
printed’’ and the devices were packaged in a 
silicone sealant. 

To highlight the flexibility of their approach, 
the authors printed QD-LEDs in multiple for- 
mats, including green and orange-red light 
emitters, 2 x 2 x 2 arrays embedded in a sili- 
cone matrix, and QD-LEDs on the surface of 
contact lenses and other substrates of interest. 


EGaln 


CdSe/ZnS QDs 


Poly-TPD 


PEDOT:PSS 


Ag NPs 


Figure 1 | Fully 3D-printed quantum-dot-based light-emitting diodes (QD-LEDs). The QD-LEDs 
reported by Kong and colleagues’ consist of five layers: a conductive ring of silver nanoparticles (Ag NPs) 
that surrounds a transparent anode layer composed of poly(ethylenedioxythiophene):polystyrene 
sulfonate (PEDOT:PSS); a hole-transport layer made of poly[N,N’-bis(4-butylphenyl)-N,N’-bis(pheny]) 
benzidine) (poly-TPD); a light-emitting layer composed of cadmium selenide/zinc sulfide quantum dots 
(CdSe/ZnS QDs); and a cathode layer composed of eutectic gallium indium (EGaIn). The diameter of 
the printed QD-LEDs is approximately 2 mm. (Figure adapted from ref. 1.) 


The printed devices exhibit brightness, an 
essential metric of device performance, that is 
10- to 100-fold below that of the best solution- 
processed QD-LEDs**. However, substantial 
improvements in device performance are 
likely to be possible by introducing an elec- 
tron-transport layer (which was absent in the 
current architecture), such as one composed of 
zinc-oxide nanoparticles, and further optimiz- 
ing the printing process. 

The 3D-printing method used by the 
authors represents a simple, but sophisticated, 
approach for patterning functional materials. 
Demonstrated applications of this technique 
include printing electrodes that interconnect 
solar-cell and LED arrays", 3D antennas” 
and rechargeable microbatteries'*. Although 
microbatteries rely on multi-material 3D 
printing of interdigitated cathode and anode 
layers, Kong and colleagues’ study is much 
more impressive, because up to six, as opposed 
to two, different materials must be printed 
sequentially to create their devices. 

One intriguing question that arises is 
whether fully 3D-printing electronic devices 
is the best approach for creating mass- 
customized electronics. Another viable 
strategy would be to combine 3D printing with 
automated pick-and-place machinery that 
places electronic components accurately and 
repeatably to generate objects with embedded 
circuitry and devices’. LEDs are commercially 
available that have lateral dimensions akin to 
those demonstrated by Kong et al., and could 
be integrated into 3D-printed objects by this 
hybrid approach. 

To vastly expand the capabilities of 
3D printing, new functional inks and multi- 
nozzle print heads and printing platforms 


must be designed for rapidly and accurately 
patterning materials over a broad range of 
compositions and ink-flow behaviour. As 
these advances are realized, it may be possible 
to print customized 3D electronic devices in 
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a highly scalable manner. We are becoming 
increasingly reliant on electronics in our daily 
lives, and so successful outcomes should be of 
great benefit to society. m 
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Deep and complex ways 
to survive bleaching 


Mass coral bleaching events can drive reefs from being the domains of corals to 
becoming dominated by seaweed. But longitudinal data show that more than half 
of the reefs studied rebound to their former glory. SEE LETTER P.94 


JOHN M. PANDOLFI 


constant battle for space is fought every 
A= of every day on the hard sub- 

strates that provide the foundation for 
living coral reefs. In one corner are reef corals 
and the photosynthetic dinoflagellate micro- 
algae that live in symbiosis inside them; in the 
other are fleshy macroalgae, better known as 
seaweed. On healthy reefs, corals are the clear 
winners and dominate reef substrates (Fig. la). 
But regime shifts to macroalgae (Fig. 1b) often 
occur in response to local anthropogenic 
drivers such as overfishing of herbivores’ or 
increased nutrients’ from pollution and land- 
use changes — two conditions more favour- 
able for seaweed than for corals. On page 94 
of this issue, Graham et al.’ provide the first 


unequivocal evidence that regime shifts from 
corals to macroalgae also occur in response to 
coral bleaching, and they identify aspects of 
reef ecology that influence the likelihood of 
this occurring. 

Coral bleaching occurs when the coral hosts 
expel their symbiotic dinoflagellates, which 
provide much of the vibrant coloration typical 
of coral reefs. Corals rely on the photosynthetic 
symbionts for their energy provision, and if 
bleached corals do not rapidly regain symbi- 
onts, they die. Mass bleaching events occur 
over broad spatial scales and affect a large 
component of the reef coral community. One 
such episode, in 1998, is often referred to as the 
largest mass bleaching event on record’; in the 
Seychelles, more than 90% of live coral cover 
was lost. 
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Figure 1 | Changing reefs. Graham et al.’ show that mass coral bleaching events, such as the one that occurred in 1998, can drive reefs from being highly 
complex, coral-rich seascapes (a) to zones of dead coral dominated by macroalgae or seaweed (b). Such regime shifts had previously been known to occur only 
in response to local stressors such as overfishing or pollution. 


Graham et al. tracked the response of coral 
and fish communities to this event across 
21 inner Seychelles islands using a 17-year 
data set that started in 1994. They found that 
9 of the 21 reefs underwent a regime shift to 
macroalgae, with the live coral cover decreas- 
ing from an average of 31% before the event 
to about 3% by 2011, and macroalgal cover 
increasing from 3% to 42% during the same 
period. Where these regime shifts occurred, 
the functional diversity of associated reef fishes 
shifted in concert with the changes in coral and 
macroalgal cover. 

One of the key strengths of this study was 
its ability to test for predictors of ecosystem 
responses to the bleaching event. Graham and 
colleagues evaluated several potential factors: 
the three-dimensional structural complex- 
ity of the reef’, water depth, abundance of 
juvenile corals, nutrient load, density of herbi- 
vorous fish and whether the reefs were part of 
‘no-take’ marine reserves. The first three of 
these drivers turned out to be the most impor- 
tant. Indeed, combining structural com- 
plexity with water depth correctly predicted 
whether or not a regime shift would occur 
in 98% of cases — regime shifts occurred 
less frequently in more structurally complex 
and deeper-water habitats. These correla- 
tions bode well for our ability to predict the 
effects of future mass bleaching events, espe- 
cially in tropical regions where conservation 
resources are limited, because these two vari- 
ables can be quickly and easily measured on 
most reefs. 

Coral reefs are often portrayed as one of the 
marine ecosystems that are most vulnerable to 
the threats of climate change, and global warm- 
ing is commonly thought to be the principal 
underlying driver of mass bleaching events. 
Although Graham and colleagues’ study is 
groundbreaking in its attribution of coral-to- 
algal regime shifts to a mass bleaching event, 
perhaps their most striking finding is that, in 


most cases (12 of 21 reefs), such regime shifts 
did not occur. The fact that more than half of 
the reefs fully recovered after the bleaching 
event is a promising outcome for the future 
of coral reefs. It is also consistent with stud- 
ies showing that each mass bleaching leaves 
many sites unaffected, with almost complete 
recovery of corals from the 1998 event in many 
parts of the world’, and that coral survivors of 
past bleaching events have a capacity to persist 
under subsequent bleaching events’. The find- 
ings also fit with experimental work suggesting 
that corals can quickly adapt to environmen- 
tal change’®. Put simply, many reef corals just 
might be capable of adapting fast enough to 
survive current rates of global environmental 
change”"”. 

A key challenge facing reef managers around 
the world is how to protect coral reefs from 
the ‘big three’ human threats: overfishing, 
pollution and climate change. A range of spe- 
cific tools is available to tackle the first two of 
these, which are comparatively local stressors, 
but there is a paucity of appropriate climate- 
specific responses. Given the contribution of 
these local stressors to the global degradation 
of reefs, it is crucial that their management 
continues. However, Graham and colleagues’ 
delineation of reef characteristics most closely 
associated with regime shifts caused by mass 
bleaching events means that we can now take 
concrete steps towards managing specifically 
for climate change as well. For example, the 
authors’ findings suggest that structural com- 
plexity and water depth should be explicitly 
incorporated into the spatial design of marine 
reserves, with structurally complex and deep- 
water habitats targeted as high-value sites that 
will be more resistant to mass coral bleaching 
than shallower sites. 

The authors’ finding that the design of 
marine protected areas in the Seychelles had 
no bearing on the ability of reefs to rebound 
from the 1998 bleaching event is unsettling, 
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and is a case in point of the need for new design 
approaches. But the Seychelles are not alone 
— many marine reserves only target areas that 
are important for sustaining fisheries. Perhaps 
we need to think about broadening the role of 
marine reserves to one that includes being a 
refuge from regime shifts, such that their suc- 
cess can be gauged not only by the number 
of fishes they contain, but also by the degree 
to which they protect explicit attributes of 
habitat diversity. To achieve this, Graham and 
colleagues’ messages on how to manage reefs 
in the face of climate change will need to be 
placed in a global context, and further long- 
term studies from reefs in other regions will be 
needed if we are to fully understand the drivers 
of regime shifts on reefs. m 
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GEORGE ROFF 


BIOCHEMISTRY 


Elusive source of 
sulfur unravelled 


The metabolic origin of the sulfur atom in the naturally occurring antibiotic 
lincomycin A has been obscure — until now. The biosynthetic steps involved reveal 
surprising roles for two sulfur-containing metabolites. SEE LETTER P.115 


CHARLES E. MELANGON III 


any vital biological molecules 
M contain sulfur, the metabolic origins 

of which can be predicted fairly 
easily by those fluent in nature's biosynthetic 
language. But the sources of sulfur atoms 
are still difficult to predict for a few such 
molecules, including several with potentially 
useful biological activities, such as the anti- 
cancer agents calicheamicin’ and leinamy- 
cin’, and the antibacterial agent albomycin’. 
On page 115 of this issue, Zhao et al.* report 
the three key enzymatic steps that together 
install the sulfur atom of the antibiotic linco- 
mycin A (Fig. 1) during its biosynthesis in the 
bacterium Streptomyces lincolnensis. Using 
a combination of techniques, they reveal a 
new role for mycothiol (MSH) — a bacterial 
sulfur-containing antioxidant — as a sulfur 
donor. They also report the unprecedented 
involvement of ergothioneine (EGT), another 
sulfur-containing bacterial metabolite, in an 
enzyme-catalysed process: the chemical acti- 
vation of the carbon atom that will eventually 
bear the sulfur. 

Deciphering the details of a complex biosyn- 
thetic pathway is, in some ways, like solving a 
jigsaw puzzle that has many more pieces than 
are needed to construct the correct picture. 
The molecular pieces (enzymes and their 
encoding genes, substrates and products) 


must often be carefully examined to solve the 
puzzle successfully (determine the correct bio- 
synthetic pathway). Some interesting mecha- 
nisms for incorporating sulfur into molecules 
have been revealed in the past few years, 
including those involved in the biosynthesis of 
thiamine’ and the antibiotic BE-7585A (ref. 6). 

The addition of whole-genome sequencing 
and sophisticated comparative genomics to the 
biosynthetic chemist’s extensive repertoire of 
techniques has helped researchers to solve 
complex biosynthetic puzzles. Specifically, the 
method helps them to predict the relationships 
between enzyme amino-acid sequences and 
functions more accurately, and allows easier 
identification of genes involved in supplying 
biosynthetic precursors. This is particularly 
evident in genes found at chromosomal posi- 
tions distant from those of related biosynthetic 
genes, which, in bacteria, are often tightly clus- 
tered. Zhao et al. use the full complement of 
tools to discover the elusive origin of the sulfur 
atom in lincomycin A. 

As is often done in biosynthetic studies, 
the authors began by working backwards 
through the proposed pathway, attempting to 
reconstruct its molecular logic by isolating 
intermediates that accumulate when the 
activities of specific genes are disrupted. They 
proposed that /mbE — a gene present in the 
sequenced biosynthetic gene cluster for linco- 
mycin A, and which encodes a homologue ofan 


7, 
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amidase enzyme involved in an MSH-depend- 
ent detoxification process in certain bacteria — 
has a role in sulfur-atom incorporation. 

Sure enough, when Zhao and colleagues 
disrupted the function of ImbE, they observed 
the accumulation of a lincomycin-like inter- 
mediate in which an MSH moiety was attached 
through its sulfur atom to the predicted sulfur- 
incorporation site. They also found that puri- 
fied LmbE enzyme cleaves the amide bond 
of this intermediate, and that feeding either 
the isolated intermediate or the product of its 
cleavage by LmbE to an S. lincolnensis mutant 
in which the function of an MSH-regenerating 
gene, mshA, was disrupted led to restoration 
of lincomycin A production. The authors had 
identified mshA through genome sequencing 
and analysis as part of the study. These find- 
ings confirmed the function of LmbE and the 
intermediacy of its substrate and product in 
the biosynthetic pathway. 

Next, the researchers examined the function 
of ImbV, another gene present in the linco- 
mycin cluster. This gene is homologous to an 
enzyme that catalyses an MSH-dependent 
isomerization reaction. When Zhao and 
co-workers disrupted the function of ImbV, 
they observed the unexpected accumulation 
of another lincomycin-like intermediate, in 
which an EGT moiety is attached through its 
sulfur atom to the site occupied by MSH in the 
metabolite isolated from the ImbE mutant. The 
authors also identified the EGT-containing 
intermediate in their mshA-disruption mutant, 
which suggests that LmbV catalyses the 
replacement of the EGT moiety by MSH. They 
could not prove this directly, because they were 
not able to express and purify LmbV, but they 
confirmed their theory by showing that a 
homologue of LmbV — CcbV, an enzyme from 
a similar biosynthetic pathway’ — catalyses the 
proposed reaction. They further confirmed the 
key role of EGT in the biosynthesis of linco- 
mycin A by disrupting the function of egtD, a 
gene involved in the biosynthesis of EGT that 
they again found through genome mining. 
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Figure 1 | Biosynthesis of lincomycin A. Zhao et al.’ report the biosynthetic 
mechanism by which a sulfur atom is incorporated into the naturally 
occurring antibiotic lincomycin A. The two main components of the molecule 
are an octose sugar (orange) and a PPL unit (blue). a, In the biosynthetic 
pathway, the LmbT enzyme attaches ergothioneine (EGT, a bacterial 
metabolite) to the octose sugar by forming a new carbon-sulfur bond; a 


nucleotide (GDP) attached to the sugar is lost in the process. b, The enzymes 
LmbC, N and D attach PPL to the sugar. c, LmbV then replaces EGT with 
mycothiol (MSH, a bacterial antioxidant), forming a new carbon-sulfur bond; 
the sulfur atom from MSH is the one that ends up in the antibiotic. d, Finally, a 
multi-step process beginning with a reaction catalysed by LmbE converts the 
MSH group into the methylmercapto group (-SCH,) of lincomycin A. 
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50 Years Ago 


It has been reliably demonstrated 
that rats can discriminate between 
the presence or absence of X-rays... 
The process by which X-rays elicit 
arousal and orienting reactions 

in mammals has not yet been 
determined. However, for simplicity 
we assume this mechanism operates 
via a ‘radiation receptor. Attempts 
to locate this hypothetical radiation 
receptor have yielded conflicting 
results ... We used a narrow 
3/16-in. X-ray beam as a signal or 
conditioned stimulus to warn the 
animal of a subsequent shock to the 
paws. The beam was most effective 
when it was directed at the olfactory 
region of the head ... Inan attempt 
to clarify this issue, we conducted a 
study of the effectiveness of X-ray 
as an arousing stimulus in rats the 
olfactory bulbs of which had been 
removed ... The results indicate a 
distinct loss of sensitivity when the 
olfactory bulbs are removed. 

From Nature 6 February 1965 


100 Years Ago 


In Popular Astronomy Prof. E. C. 
Pickering quotes some interesting 
letters from Profs. Backlund, 

of Pulkovo, and Schwarzchild, 

of Potsdam, with reference to 
astronomers and the war. None 

of the Pulkovo astronomers have 
been called to serve, but Prof. 
Backlund’s son is in the Russian 
ranks, and of French astronomers, 
M. Croze, astrophysicist of the 
Paris Observatory, has been 
summoned, as well as the son of 
the director, M. Baillaud, who has 
six sons and sons-in-law in the 
war. On the German side, many 
young astronomers are in the field. 
Dr. Zurhellen and Dr. Kithl, who 
were with the eclipse expedition, 
have been interned in Russia; 

Dr. Minch, of Potsdam, is wounded 
anda prisoner in France. 

From Nature 4 February 1915 


Finally, Zhao et al. tested the function of 
another enzyme in the lincomycin pathway, 
LmbT, which they thought might attach EGT to 
a biosynthetic intermediate. LmbT is a homo- 
logue of glycosyltransferase enzymes, which 
attach sugars to other molecules. The research- 
ers performed a series of gene-disruption and 
in vitro biochemical experiments, establish- 
ing that LmbT must act before installation of 
the 4-propyl-L-proline (PPL) moiety, which 
forms part of the structure of lincomycin A. In 
the process, they also proved that three more 
enzymes — LmbC (ref. 8), LmbN and LmbD — 
collectively incorporate PPL into the antibiotic. 

Zhao and colleagues went on to isolate the 
suspected product of LmbT and to demon- 
strate the enzyme’s function using in vitro 
assays. They discovered that LmbT catalyses 
the transfer of lincomycin A’s sugar (for which 
the biosynthetic pathway has previously been 
reported’) to EGT, thus chemically activat- 
ing the sugar in readiness for its reaction with 
MSH later in the pathway. Such a role is com- 
pletely unprecedented: EGT was known to 
exist as a metabolite, but not as a substrate for 
an enzyme-catalysed reaction. 

A particularly impressive aspect of this 
work is the authors’ use of an intricate series 
of in vivo and in vitro experiments that relied 
on intermediates obtained from mutant cul- 
tures and from both enzymatic and chemi- 
cal syntheses, guided by comparative gene 
analysis and genome mining. More generally, 
the study demonstrates that integration of 
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primary metabolites (those that are essential 
for an organism's survival, such as MSH and 
EGT) and secondary metabolites (non-essen- 
tial compounds, such as the products of the 
Lmb enzymes) is crucial for the biosynthesis 
of complex molecules. It also highlights the 
ingenious ways in which nature repurposes 
enzymes — in this case, using homologues 
of MSH-dependent detoxification enzymes 
for biosynthesis. And the establishment of 
functions for LmbE, LmbV and LmbT will no 
doubt help researchers to work out the func- 
tions of the enzymes’ numerous homologues 
in the ever-growing roster of sequenced 
genomes. = 
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Climate sensitivity 
in a warmer world 


Comparison of climate records from the Pliocene and Pleistocene geological 
epochs of the past five million years suggests that positive climate feedbacks 
are not strengthened during warm climate intervals. SEE ARTICLE P.49 


DAVID W. LEA 


major concern in projecting future 
Az change using models is that 

positive climate feedbacks might 
become enhanced in a warm climate, accel- 
erating future warming in response to rising 
greenhouse-gas levels. Climate feedbacks are 
changes in atmospheric or surface properties 
induced by climate change that magnify or 
diminish the overall temperature response. 
Their aggregate strength is represented by 
the climate sensitivity, which is the ratio of 
observed warming to climate forcing, such 
as increasing atmospheric carbon diox- 
ide levels. Warm intervals of Earth’s recent 
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geological past, which can be studied through 
climate proxies, provide a basis for testing the 
response of climate sensitivity to warming. 
On page 49 of this issue, Martinez-Boti et al.’ 
use improved proxy atmospheric CO, data to 
compare climate-sensitivity determinations 
from the warm Pliocene epoch, 5.3 million to 
2.6 million years (Myr) ago, to those from the 
cold, extensively glaciated Pleistocene epoch, 
2.6 to 0.012 Myr ago. They find that climate 
sensitivity differs little between these vastly 
dissimilar times, once the influence of ice 
sheets is removed. 

Why should climate sensitivity be stronger 
in a warm world? A warmer world is likely to 
have less snow and ice, thereby reducing their 
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Figure 1 | Then and now. Comparison of Strahcona Fjord, Ellesmere Island, in the high Canadian Arctic during the Pliocene epoch (illustration on the left) 
and today (photograph on the right). The changes illustrate the extreme polar amplification of warming during the Pliocene. Martinez-Boti et al.' use climate 
reconstructions from the Pliocene to determine whether positive climate feedbacks were stronger during warm intervals. 


amplifying effect on climate change’. But 
how other feedbacks, such as water vapour 
and clouds, respond to warming is less certain. 
Simulations with climate models suggest that 
the positive feedback due to water vapour may 
strengthen in warmer climates’, but uncer- 
tainties about how cloud feedbacks respond 
to warming confuse our understanding of the 
overall dependence of climate sensitivity on 
climate state. 

Ancient climate records provide an alterna- 
tive approach to assessing climate sensitivity, 
through the analysis of proxies, which reveal 
both the forcing (for example, atmospheric 
CO, levels or ice extent) and response (the 
temperature change). This approach offers the 
tremendous advantage of relying on natural 
equilibrium climate states rather than on syn- 
thetic ones simulated in models. Past climates 
were also influenced by various slow feedbacks 
such as ice sheets, vegetation and dust — 
factors typically not included in climate simu- 
lations. However, the method hinges on proxy 
reconstructions that have associated uncer- 
tainties, especially for marine-based atmos- 
pheric CO, reconstructions used in studies 
reaching beyond 0.8 Myr ago, the age of the 
oldest ice cores®. 

Palaeoclimate researchers have targeted 
the Pliocene epoch because it is the most 
recent time interval in which conditions were 
substantially warmer, about 2-3 °C warmer 
globally, than pre-industrial conditions’. Proxy 
reconstructions indicate that the Arctic climate 
during the Pliocene was much warmer than it 
is today, about 8-19 °C warmer, depending on 
location and season’ (Fig. 1). But this extreme 
Arctic warmth seems to have coexisted, para- 
doxically, with atmospheric CO, levels that 
are similar to the present ones, implying an 
extreme amplification of positive climate feed- 
backs in the Pliocene’. 

Martinez-Boti and colleagues challenge 
this existing hypothesis using a well-validated 


technique to reconstruct Pliocene atmospheric 
CO, between 3.3 and 2.3 Myr ago at higher 
temporal resolution and with less variability 
than previous proxy reconstructions’. Their 
reconstruction clearly indicates for the first 
time that mid-Pliocene atmospheric CO, was 
up to 60% higher than pre-industrial values 
(450 parts per million (p.p.m.), compared with 
280 p.p.m. before the Industrial Revolution 
and 400 p.p.m. today). The new record also 
indicates clear transitions in atmospheric CO, 
that are coherent with known climate events, 
including a drop in atmospheric CO, between 
2.9 and 2.7 Myr ago that precedes global cool- 
ing and the onset of Northern Hemisphere 
glaciation 2.6 Myr ago — remarkable findings 
in themselves. 

The researchers go a step further, applying 
their results to climate sensitivity for the 
warm Pliocene state by developing averaged 
reconstructions of land and ocean tempera- 
ture and comparing them directly to their 
atmospheric CO, reconstruction. The slope 
relating the forcing (atmospheric CO,) and 
response (temperature) at each time slice 
yields a tight constraint on climate sensitiv- 
ity that is specific to the Pliocene. The authors 
find Pliocene climate sensitivity to be half as 
strong as that found for the cold Pleistocene. 
A repeat of the analysis after removing the 
forcing associated with glacial-interglacial 
changes in ice sheets reveals that Pliocene 
and Pleistocene climate sensitivities to atmos- 
pheric CO, changes alone were essentially 
the same. 

In a broader context, these results also 
relate to attempts to use the instrumental 
temperature record to narrow the range 
of equilibrium climate sensitivity, which 
is the equilibrium temperature change caused 
by a doubling of atmospheric CO, allowing 
for ‘fast’ feedback processes only. Some stud- 
ies have argued’ that the slightly weaker rate of 
global warming since 2001 reduces the lower 


boundary of equilibrium climate sensitivity 
to well below 2°C. Although Martinez-Boti 
and colleagues’ derived Earth-system sensi- 
tivity’® includes slow feedbacks, which com- 
plicates direct comparison to results from 
climate models, their results are likely to trans- 
late’’ to an equilibrium climate sensitivity of 
between 2 and 3°C, well within the generally 
accepted range. 

Despite the significant advance Martinez- 
Boti and co-workers’ study represents, 
several challenges remain. First, given the 
wide range of proxy atmospheric CO, data for 
the Pliocene’, it will be essential to validate the 
new results and assess why earlier reconstruc- 
tions and methodologies differ from this one. 
Second, the extreme Pliocene warming in 
the terrestrial Arctic (Fig. 1) still requires 
enhanced polar climate feedbacks that remain 
unexplained’. Finally, for climate modellers, 
there remains the substantial challenge of 
reconciling emerging palaeoclimate-based 
sensitivity results with simulations of both past 
and future climate states. m 
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Plio-Pleistocene climate sensitivity 
evaluated using high-resolution 
CO, records 


M. A. Martinez-Boti'*, G. L. Foster’*, T. B. Chalk!, E. J. Rohling’”, P. F. Sexton’, D. J. Lunt, R. D. Pancost*®, M. P. S. Badger’® 
& D. N. Schmidt*” 


Theory and climate modelling suggest that the sensitivity of Earth’s climate to changes in radiative forcing could depend 
on the background climate. However, palaeoclimate data have thus far been insufficient to provide a conclusive test of 
this prediction. Here we present atmospheric carbon dioxide (CO) reconstructions based on multi-site boron-isotope 
records from the late Pliocene epoch (3.3 to 2.3 million years ago). We find that Earth’s climate sensitivity to CO2-based 
radiative forcing (Earth system sensitivity) was half as strong during the warm Pliocene as during the cold late Pleisto- 
cene epoch (0.8 to 0.01 million years ago). We attribute this difference to the radiative impacts of continental ice-volume 
changes (the ice-albedo feedback) during the late Pleistocene, because equilibrium climate sensitivity is identical for the 
two intervals when we account for such impacts using sea-level reconstructions. We conclude that, on a global scale, no 
unexpected climate feedbacks operated during the warm Pliocene, and that predictions of equilibrium climate sensitivity 
(excluding long-term ice-albedo feedbacks) for our Pliocene-like future (with CO, levels up to maximum Pliocene levels of 
450 parts per million) are well described by the currently accepted range of an increase of 1.5 K to 4.5 K per doubling of CO). 


Since the start of the industrial revolution, the concentration of atmo- 
spheric CO, (and other greenhouse gases) has increased dramatically 
(from ~280 to ~400 parts per million)’. It has been known for over 
100 years that changes in greenhouse gas concentration will cause the 
surface temperature of Earth to vary’. A wide range of observations 
reveals that the sensitivity of Earth’s surface temperature to radiative 
forcing amounts to ~3 K warming per doubling of atmospheric CO, 
concentration (with a 66% confidence range of 1.5-4.5 K; see refs 1 and 3), 
caused by direct radiative forcing by CO, plus the action ofa number of 
fast-acting positive feedback mechanisms, mainly related to atmospheric 
water vapour content and sea-ice and cloud albedo. Uncertainty in the 
magnitude of these feedbacks confounds our ability to determine the 
exact equilibrium climate sensitivity (ECS; the equilibrium global tem- 
perature change for a doubling of CO, on timescales of about a century, 
after all ‘fast’ feedbacks have had time to operate; see ref. 3 for more 
detail). Although the likely range of values for ECS is 1.5-4.5 K per CO, 
doubling, there is a small but finite possibility that climate sensitivity 
may exceed 5 K (see ref. 1). Understanding the likely value of ECS clearly 
has important implications for the magnitude, eventual impact and 
potential mitigation of future climate change. 

Any long-range forecast of global temperature (that is, beyond the 
next 100 years) must also consider the possibility that ECS could depend 
on the background state of the climate*>. That is, in a warmer world, 
some feedbacks that determine ECS could become more efficient and/or 
new feedbacks could become active to give additional warmth for a given 
change in radiative forcing (such as those relating to methane cycling®, 
atmospheric water vapour concentrations””®, in addition to changes 
in the relative opacity of CO, to long-wave radiation*”). One approach 
to identify whether ECS depends on the climate background state is to 


reconstruct ECS during periods in the geological past when Earth was 
warmer than today. 

The Pliocene (2.6-5.3 million years (Myr) ago) is one such time, with 
the warmest intervals between 3.0 Myr and 3.3 Myr ago about 3 K glob- 
ally warmer than pre-industrial times’®”; the mean sea level was 12-32 m 
above the present level’*’’. Although most of this warmth is commonly 
ascribed to increased atmospheric CO, levels", it has been suggested 
that simple comparisons of the observed temperature change in the geo- 
logical record with the climate forcing from CO, alone are unable to 
constrain ECS". Instead, a parameter termed the Earth system sensi- 
tivity (ESS) is defined, as the change in global temperature for a dou- 
bling of CO, once both fast and slow feedbacks have acted and the whole 
Earth system has reached equilibrium. (In contrast, ECS excludes the 
slow feedbacks; for a discussion of fast versus slow feedbacks, see ref. 3.) 
The most important slow feedbacks are those related to ice-albedo and 
vegetation-albedo changes. Because of these slow feedbacks, Pliocene 
ESS is thought to have been ~50% higher than ECS'°”*, with some exist- 
ing geological data suggesting a Pliocene ESS range of 7-10 K per COz 
doubling’’, which greatly exceeds a modern ESS estimate of ~4 K per 
CO, doubling”. If ECS was similarly enhanced, then that would imply 
that either extra positive fast feedbacks operated, or that existing posi- 
tive fast feedbacks were more efficient, thus increasing the temperature 
response for a given level of CO, forcing. 

Understanding past climate sensitivity depends critically on the accu- 
racy of the CO, data used. Despite a tendency towards increased agree- 
ment between different CO, proxies”, individual estimates of the partial 
pressure of CO) (pco,) for the Pliocene still range from ~190 ppatm to 
~440 jiatm (Fig. 1a, b) and there is little coherence in the trends described 
by the various techniques (Fig. 1a, b). This hinders any effort to accurately 
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based on 81°C of sedimentary alkenones (dark green circles (ODP 999)"; 
aquamarine squares (ODP 999)’*; dark orange (ODP 1208)"*, purple circles 
(ODP 806)’; and dark red squares (ODP 925)”). Error bars are uncertainty in 
Pco,*"” at the 95% level of confidence. b, 8''B of planktic foraminifera from 
ODP 999 (blue circles for Globigerinoides sacculifer’ and blue squares for 

G. ruber”; red squares for G. sacculifer”*) and stomatal density of fossil leaves 
(purple filled circle)®°. Error bars are uncertainty in pco,*” at the 95% level 
of confidence. c, New boron isotope data (this study) from ODP 999 (blue 
circles) and ODP 662 (red circles). Error bands for ODP 999 denote 1 standard 
deviation (sd) (dark blue) and 2 sd (light blue) analytical uncertainty; error 
bars for ODP 662 show 2 sd analytical uncertainty. d, Atmospheric peo, 
determined from data shown in c for ODP 999 (blue circles) and ODP 662 
(red circles). Error band encompasses 68% (dark blue) and 95% (light blue) of 
10,000 Monte Carlo simulations of pco,“"" using the data in c and a full 
propagation of all the key uncertainties (see Methods). For ODP 662 error bars 
encompass 95% of 10,000 simulations. Dotted lines show the modelled 
threshold of Northern Hemisphere glaciation (280 Latm)**. e, Benthic 580 
stack”, with prominent marine isotope stages labelled (blue for glacial, red for 
interglacial stages). Thick lines on several panels are non-parametric smoothers 
through the data. The blue open circle in d highlights the data point that is 
identified as an outlier in Fig. 2 and not used in subsequent regressions. 


constrain Pliocene ECS or ESS. To better determine Pliocene CO, levels, 
we generated a new record, based on the boron isotopic composition 
(5''B) of the surface mixed-layer dwelling planktic foraminiferal spe- 
cies Globigerinoides ruber from Ocean Drilling Program (ODP) Site 999 
(Caribbean Sea, 12° 44.64’ N, 78° 44.36’ W, 2,838 m water depth; Ex- 
tended Data Fig. 1) at a temporal resolution (one sample about every 
13,000 years (13 kyr); Fig. 1c) that is more than three times higher than 
previous 5'’B records (one sample every 50 kyr; Fig. 1b). The 8'’B of 
G. ruber is a well-constrained function of pH (ref. 18) and seawater 
pH is well correlated with [CO2],, (the aqueous concentration of CO3), 
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because both are a function of the ratio of alkalinity to total dissolved 
carbon in seawater. In the absence of major changes in surface hydro- 
graphy, [COy]a, is largely a function of atmospheric CO, levels, and 
51'B-derived CO, has been demonstrated to be an accurate recorder of 
atmospheric CO, (Extended Data Fig. 2a)'**°. Today, the surface water 
at ODP Site 999 is close to equilibrium with the atmosphere with respect 
to CO, (expressed here as Apco, = pco,” — pco,"” = +21 patm; Ex- 
tended Data Fig. 1)'**" and has remained so for at least the past 130 kyr 
(Extended Data Fig. 2)'*. ODP Site 999 also benefits from a detailed astro- 
nomically calibrated age model” and high abundance of well-preserved 
planktic foraminifera throughout the past 4 Myr (refs 23, 24). During 
our study interval it is also unlikely to have been influenced by long-term 
oceanographic changes such as the emergence of the Panama isthmus 
~3.5 Myr ago (see detailed discussion in ref. 23). To increase confi- 
dence that atmospheric CO, changes are driving our pH (and hence our 
Pco,*”) record for ODP Site 999 and that the air-sea CO, disequilibrium 
remained similar to modern values, we also present lower-resolution 
8"B data from G. ruber from ODP Site 662 (equatorial Atlantic, Fig. 1c; 
1° 23.41’ S, 11° 44.35'°W, 3,821 m water depth; Extended Data Fig. 1), 
where current mean annual Apco, is +29 atm with a seasonal maxi- 
mum of +41 tatm (ref. 21). Analytical methodology and information 
detailing precisely how pco,™ is calculated, with full propagation of 
uncertainties, can be found in the Methods (with full 8''B and Pco, 
values listed in Supplementary Tables 1 and 2). 


A new record of Pliocene pcg, change 


Where our data for both sites overlap in time, reconstructed pco,™ 
values 2.3-3.3 Myr ago agree within uncertainty (Fig. 1d; Extended Data 
Fig. 3), and are consistent with most independent records (see Fig. 1a, b; 
Extended Data Fig. 2b, c), confirming that the variations we observe are 
predominantly driven by changes in atmospheric CO2 concentrations. 
However, the enhanced resolution of our 5''B-pco,”"™ record (Fig. 1d) 
also reveals a hitherto undocumented'*”??>” level of structure in the CO, 
variability during the one-million-year period investigated, including 
a transition centred on 2.8 Myr ago, spanning ~200 kyr, during which 
average pco,*'” undergoes a decrease of ~65 patm (Fig. 1d). 
Detailed atmospheric CO; measurements from ice cores show orbital- 
scale (~ 100 kyr) oscillations in pco,*"" with a peak-to-trough variation 
of ~80-100 patm through the late Pleistocene (90% of the pco, values 
lie between +36 j1atm and -41 atm of the long-term mean; Extended 
Data Figs 2 and 4)””-”. Once the long-term trend is removed from our 
Plio-Pleistocene data (thick blue line in Fig. 1d), and we have taken into 
account our larger analytical uncertainty (see Methods), we observe 
orbital-scale variations in our 8''B-pco,*™ record of only slightly smaller 
amplitude than the ice-core pco,*"™ record (0-0.8 Myr) and for the 
last 2 Myr in other 8''B-based records'*”**° (Extended Data Fig. 4 and 
Methods), which is in clear contrast with the benthic 5 180, which shows 
increasing variability over the last 3 Myr (Fig. le and Extended Data Fig. 4). 
Given the different amplitudes of climate variability, the observed 
similarity between Pliocene and late Pleistocene peo, variability seems 
counter-intuitive given the notion that CO, is a key factor in amplify- 
ing glacial-interglacial climate change*”*”*"’. This is illustrated by a 
well-defined nonlinear relationship in a cross plot between deep-sea 
benthic 8'°O and In(CO,/C,) (where C, is the pre-industrial CO, level 
of 278 atm), which accounts for the logarithmic nature of the climate 
forcing by CO; (Fig. 2b). Note also the clear overlap between Pleisto- 
cene (0-2.2 Myr) ice-core CO, measurements and 5!'B-based CO, 
reconstructions in this plot (Fig. 2b; Extended Data Fig. 2). A similar 
relationship is also evident in raw 5''B-space (Fig. 2a). Below an inflec- 
tion at about poo," = 275 + 15 patm (equating to In(CO2/C,) ~ 0), 
benthic 5'°O showsa steeper relationship with CO-based forcing than 
it does above this value (Fig. 2). This probably reflects some combina- 
tion of: (1) growth of larger Northern Hemisphere ice sheets at pco,"™ 
below 275 + 15 atm (ref. 33), increasing radiative ice-albedo feedback 
and amplifying climate forcing by CO, change; (2) an increase in oxy- 
gen isotope fractionation in precipitation with increasing size of the 
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Figure 2 | Relationship between 5''B, climate forcing from CO, and 5'°0. 
a, b, 5''B versus 8'80 (a) and In(CO,/C,) versus 8'80 (b) for data from the last 
3 Myr. Boron data in a are from this study (blue open and closed circles) 

and published studies (green circles”; blue triangles”). Ice-core CO data are 
shown as open red circles”””’. The vertical dashed line is at a CO, of 278 patm. 
The data point removed from subsequent regression analysis is shown as 

open blue circles in a and b. Note that for clarity the 8''B-pco, data from ref. 23 
are not plotted. The black line is a non-parametric regression through all the 
data shown. The 5"'B data from ref. 30 have been corrected for laboratory and 
inter-species differences through a comparison between core-top 5''B values. 


ice sheets, which leads to a proportionally greater '*O enrichment in 
seawater*+; and (3) potentially stronger deep-sea cooling at low pco,**™ 
due to the high-latitude-focused influences of the ice-albedo feedback 
process. These findings highlight the profound impacts of Northern 
Hemisphere ice-sheet growth on climate variability in the Pleistocene*'”’, 
relative to the Pliocene (Fig. 2b). 

Our data show that the ~275 + 15 patm threshold was first crossed 
~2.8 Myr ago during Marine Isotope Stage (MIS) G10 (Fig. 1d, horizon- 
tal dashed line), and—more persistently—during subsequent MISs G6 
(2.72 Myr ago), G2 (2.65 Myr ago), and 100 (2.52 Myr ago), when values 
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as low as 233 "23 [atm (95% confidence) were reached and when inter- 
vening interglacial values also seem to have been suppressed (Fig. 1c, d). 
These isotope stages are notable in that they are associated with an 
increase in the amplitude of glacial—interglacial sea-level oscillations 
(Extended Data Fig. 5b)’*'** and coincide with the timing of the first 
substantial continental glaciations of Europe, North America and the 
Canadian Cordillera, as reconstructed by ice-rafted debris and observa- 
tions of relic continental glacial deposits****. Hence, our high-resolution 
Pco, atm record robustly confirms previous hypotheses'*”*”>*? (based on 
low-resolution CO, data) that the first substantial stages of glaciation 
in the Northern Hemisphere, as well as a recently recognized deep-sea 
cooling during the late Pliocene/early Pleistocene’, coincided with a 
significant decline in mean atmospheric pco,*” 2.7-2.9 Myr ago of 
~40-90 jtatm (the mean value of pco,*"” 3.0-3.2 Myr ago minus its 
mean 2.4-2.7 Myr ago = 66 + 26 patm; two-tailed P< 0.001, n = 40). 


Efficiency of climate feedbacks 


The high fidelity of the boron isotope pH/pco,*"” proxy (Extended Data 
Fig. 2), coupled with the high resolution of our pco,""™ record, offers 
an opportunity to examine the sensitivity of Earth’s climate system to 
forcing by CO, during a period when Earth’s climate was, on average, 
warmer than today*’. For this exercise, global temperature estimates 
are also needed. We consider two approaches for this. The first is an 
estimate of global mean annual surface air temperature change (AMAT) 
over the last 3.5 Myr, froma scaling of the Northern Hemisphere climate 
required to drive an ice-sheet model to produce deep-ocean temperature 
and ice-volume changes consistent with benthic 8'°O data (Fig. 3a, b)*>. 
This approach produces a continuous record of global temperature that 
agrees well with independent constraints for discrete time intervals (see 
ref. 35). 

We supplement AMAT with a record froma second approach, which 
is independent from benthic 8/80 values. For this, we generated a sea 
surface temperature stack (SST,,) from 0 to 3.5 Myr ago (Fig. 3c, d), 
comprising ten high-resolution (average ~4 kyr) SST records based on 
alkenone unsaturation ratios (expressed as the U5; palaeotemperature 
index), from latitudes between 41 °S and 57 °N. The selected sites (see 
Extended Data Fig. 1b) all offer near-continuous temporal coverage of 
the last 3.5 Myr (see Methods). Our SST stack agrees well with inde- 
pendent, higher-density compilations of global SST change***' (blue line 
in Fig. 3c), indicating that SST, offers a reliable approximation of glo- 
bal SST change (see Methods for more details). Moreover, our SST stack 
allows us to directly compare the major SST changes, within the same 
archives, between the Plio-Pleistocene and late Pleistocene. 

When comparing temperature records from the two approaches 
considered, it must be emphasized that AMAT reflects the global mean 
annual surface air temperature change, while the SST stack approximates 
the global mean sea surface temperature change. Hence, their amplitudes 
of variability will be different, mainly because SST,, does not include 
temperature changes over land. Approximately, ASST = AMAT X 0.66 
(refs 32 and 42), but a direct conversion is not needed here, because we 
merely aim to contrast Pliocene climate behaviour with that for the 
Pleistocene, within the same data types. 

To determine the sensitivity of global SST and AMAT to CO, forc- 
ing in the Pliocene and Pleistocene, we use a time series of forcing cal- 
culated from our new and existing CO, records (Fig. 3e-h), and regress 
these against both AMAT and the SST stack (Fig. 3a to d; Supplemen- 
tary Tables 1-3). The regression slopes then describe the average tem- 
perature change AT (in units of K) per watt per square metre of forcing 
(AF) for each time interval. These gradients therefore approximate the 
commonly used sensitivity parameter S = AT/AF (in units of KW | m*) 
for describing global temperature change for a given forcing’. In this 
scheme, a doubling of atmospheric CO is equivalent to a forcing of 
3.7Wm ~,s0 that for the 66% confidence interval of modern climate 
sensitivity quoted by ref. 1, the present-day equilibrium value of S 
(which we denote S*, where superscript ‘a’ is for ‘actuo’, after ref. 3) 
is 1.5/3.7 to 4.5/3.7 = 0.4-1.2 KW! m”. However, using palaeoclimate 
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Figure 3 | Pleistocene and late Pliocene time series. a,b, AMAT*. c, d, ASST. 
Data from this study are shown in red and from a stack of a more 
comprehensive compilation” in blue. Uncertainty envelopes at 95% confidence 
for both temperature records are shown in red. e, AFco for the Pleistocene 
from ice-core data”””’. f, AFcop for the late Pliocene, calculated using the 
CO, data from this study. g, AFco211 calculated using data in e and published 
sea-level records (R14 from ref. 13, VDW11 from ref. 35 and RO9+E12 

from ref. 44 for 0-520 kyr and ref. 45 for 520-800 kyr). h, AFeo2,1; for the 
late Pliocene calculated using the CO, data from this study and published sea- 
level records (N09 from ref. 46 recalculated by ref. 12, R14 from ref. 13, VDW11 
from ref. 35). Error bands in e to h represent the uncertainty in smoothed 
CO, record and sea level (at 95% confidence), propagated using a Monte Carlo 
approach (n = 1,000) for each reconstruction. 


data it is not possible to determine the direct equivalent of S*; instead, 
such studies constrain a ‘past’ parameter (S?), which includes the com- 
bined action of both fast and slow feedbacks’. Note that ESS (in units 
of K) = S? X 3.7. Explicitly accounting for slow feedback processes in 
determinations of S? can make it approximate S* (ref. 3). Following ref. 3, 
an S? estimate after accounting for carbon-cycle feedback is indicated 
by Scoz, and one accounting for both carbon-cycle feedbacks and feed- 
backs between land-ice and albedo feedbacks is Sco2,1 where the latter 
gives a useful approximation of S*. We follow this approach, using S? = 
AMAT/AF and S?°S" = ASST/AF (both in units of KW! m’). Note 
that our determinations of the sensitivity parameter are based on our 
entire reconstructed time series, rather than on a simple comparison 
between a limited Pliocene average and the modern average, as was 
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done in previous studies™’. Since we calculate a S? (and S?$°") for the 
Pliocene and compare this to the late Pleistocene S? (and SP SST) we 
also avoid complications caused by independent changes in boundary 
conditions (such as topographic changes)*’ because we assess sensitiv- 
ity within each relatively short time window (2.3-3.3 Myr ago versus 
0-0.8 Myr ago). In addition, our approach emphasizes relative changes 
in CO, levels and temperature over the intervals considered, rather than 
absolute values. This improves accuracy because relative changes are 
much better constrained than absolute temperature and pco, atm values 
from proxy data (see Methods for further discussion). 

Preliminary regression of AMAT against Pliocene pco,"™ identified 
one data point (at 2,362 kyr; white circle in Figs 1d and 2) with a par- 
ticularly large residual and notable leverage on the least-squares regres- 
sion (a high Cook’s distance). With interglacial-like pco,*"” values but 
glacial-like 5'*O values (Fig. 2), this point may reflect a chronological 
error, or a short period of unusually high air-sea disequilibrium with 
respect to CO, at ODP Site 999. To avoid the influence of this one point 
on subsequent linear regressions, we have removed it from our 5'B- 
pco,”™ record. The remaining pco,*” data (73 points) were interpo- 
lated to a constant resolution (1 kyr), smoothed with a 20-kyr moving 
average to reduce short-term noise and resampled back to the original 
data spacing (about one sample every 13 kyr). A Monte Carlo approach 
was followed to determine uncertainties for this smoothed record given 
the uncertainty in the 5'’B-derived pco,""". Radiative forcing changes 
due to pco,*"” changes are calculated using AFoo2 = 5.35In(CO,/C,), 
where C, = 278 atm (Fig. 3). We ignore mean annual forcing by 
orbital variations because it is small (<0.5 Wm” with a periodicity of 
100-400 kyr)*!? and averages out over the length of our records. Linear 
regressions of AMAT and SST, versus AFco were performed using an 
approach that yields a probabilistic estimate of slope, and hence sensi- 
tivity to CO, forcing (Sco2 = AT/AFeo2 or Sco2,L1 = AT/AFco2,11)> 
which fully accounts for uncertainties in both x and y variables (see 
Methods and Fig. 4). Figure 5a—d displays probability density functions 
of the determinations of slope for each time interval. This analysis reveals 
that, irrespective of the global temperature record used (AMAT or the 
SST,,), the average global sensitivity of Earth’s climate to forcing by 
CO, only (Sco2) is approximately twice as high for the Pleistocene as it 
is for the Pliocene (Figs 4 and 5). This validates previous inferences of 
a strong additional feedback factor during the Pleistocene (at pco,*"™ 
levels below ~280 pratm), which probably arises from the growth and 
retreat of large Northern Hemisphere ice sheets and their role in chang- 
ing global albedo*’*’. 

Given that, to a first order, the Earth system responds to radiative forc- 
ing in a consistent fashion, largely independent of the nature of that 
forcing®, we can determine the climate forcing arising from continental 
ice albedo changes via a relatively simple parameterization of sea-level 
change (AF,; = sea-level change (in units of metres) X 0.0308 W m”; 
following refs 31 and 32). Several reconstructions of sea-level change 
partially or completely span the last 3.5 Myr (for example, refs 13, 35, 
44, 45 and 46 recalculated by ref. 12), and we explore the implications 
of each of these independent records. Cross-plots of combined CO, and 
ice-albedo forcing (AFco2 + AFiy = AFco2,11) versus AMAT and SST 
are shown in Fig. 4 for the Pliocene and Pleistocene. Figure 5e—h dis- 
plays the influence of choices of temperature and sea-level record on our 
determinations of Sco211 (= AT/AFco2,)). In contrast to Sco2; Sco2,11 
is similar for both the Pliocene and Pleistocene, regardless of temper- 
ature record or other parameter choices (Fig. 5). This robustly indicates 
that the apparent difference between Pliocene and Pleistocene climate 
sensitivity arises almost entirely from ice-albedo feedback influences. 

It also implies that all of the other feedbacks that amplify climate forc- 
ing by CO, (for example, sea-ice and cloud albedo, water vapour, vege- 
tation, aerosols, other greenhouse gases) must have operated with similar 
efficiencies during both the Pliocene and Pleistocene. Thus, we find no 
evidence that additional (unexpected) positive feedbacks had become 
active to amplify Earth system sensitivity to CO, forcing during the warm 
Pliocene. Alternatively, ifadditional positive feedbacks did become active 
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(for example, an increase in steady-state methane concentration or 
changes in cloud properties), then their effect must have been negated by 
the loss of other amplifying feedbacks (for example, Arctic sea-ice) or the 
addition of more negative feedbacks. This finding is at odds with pre- 
vious studies (such as refs 16,47), most probably because of differences 
in our approach to determine Pliocene climate sensitivity (that is, we 
determine a sensitivity within the Pliocene) and shortcomings in the 
proxy systems used by the earlier investigations, both in terms of CO, 
and temperatures (see ref. 48). For instance, Fig. 1d (and Extended Data 
Fig. 2) indicate that both orbital-scale variability in pco,““” and the major 
decline at 2.72.9 Myr ago are absent from the previously used”® alkenone- 
based pco,*"™ records and as a result regressions of temperature and 
alkenone-derived forcing are poorly defined (Extended Data Fig. 2d-f). 


Density 


cases the slope m and standard error uncertainty 
are determined by least-squares regression. Also 
shown are the P values for the regressions. 
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Constraints on climate sensitivity 


Using the geological record to estimate ECS directly (and thus S*) is prob- 
lematic because information on the appropriate magnitude ofa number 
of key feedbacks (such as vegetation albedo) is typically unavailable’. 
Nonetheless, considerable effort has determined that ECS estimates 
based on the last glacial maximum fall within the range of ECS esti- 
mates from other approaches (1.5-4.5K per CO, doubling, or 0.4- 
1.2K W | m’ref. 1). Our analysis implies that a similar ECS applies to 
the Pliocene and early Pleistocene (2.3-3.3 Myr ago; Fig. 5; Supplemen- 
tary Table 4). In addition, our estimate of Pliocene Sco2 using AMAT 
lies within a range of 0.6-1.5 KW ' m* (at 95% confidence), meaning 
that, once all feedbacks have played out for future CO, doubling, ESS 
(= Sco2 X 3.7) will very probably (95% confidence) be <5.2 K and will 


Figure 5 | Probability density functions of the 
slope from regressions of temperature against 
climate forcing. a, c, e, g, AMAT and b, d, f, h, 
ASST against AFco and AFeo2 11 for the 
Pleistocene (a, b, e, f) and Pliocene (c, d, g, h), 
taking into account the uncertainties on all 
variables (see text). In e-h individual probability 
density functions are shown for different choices of 
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AF coo 


sea level: the combined probability density function 
shown in bold is the sum of these different 
probability density functions and therefore also 
incorporates uncertainty related to the choice of 
sea-level record. Also shown are the median 

(in boldface), the 68th percentile (dot-dash) and 
95th percentiles (dashed). 
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probably (68% confidence) fall within a range of 3.0-4.4 K (Supplemen- 
tary Table 4). 

In May 2013, atmospheric CO levels crossed the 400 parts per million 
threshold to values last seen during the Pliocene (Fig. 1c). Given current 
CO, emission rates, global temperatures may reach those typical of the 
warm periods of the Pliocene by 2050". Our findings suggest that, if the 
Earth system behaves in a similar fashion to how it did during the Plio- 
cene as it continues to warm in the coming years, an ECS of 1.5-4.5 K 
per CO; doubling’ probably provides a reliable description of the Earth’s 
equilibrium temperature response to climate forcing, at least for global 
temperature rise up to 3 K above the pre-industrial level. Studies of even 
warmer intervals in the deeper geological past (well before 3.3 Myr ago) 
are needed to determine whether any additional climate feedbacks should 
be expected as Earth warms even further into the twenty-second cen- 
tury if CO emissions continue unabated. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 
Sample locations. We present new data from two deep ocean sites: ODP Site 999 
(Caribbean Sea, 12° 44.64’ N and 78° 44.36’ W) and ODP Site 662 (equatorial Atlan- 
tic, 1° 23.41 S, 11° 44.35’ W). Both sites have well-constrained age models for the 
Pliocene and are part of the Lisiecki and Raymo benthic foraminifera 5'°O stack” 
(hereafter LR04). Sedimentation rates are comparable between the sites (~3 cm kyr — : 
at ODP 999 and ~4cmkyr_' at ODP 662). At ODP Site 999, seventy-four samples 
were analysed at an average temporal resolution of around one sample every 13 kyr, 
targeting several glacial and interglacial maxima. ODP Site 662 was analysed at 
much lower resolution (8 samples in 1,000 kyr = 1 sample every 125 kyr on aver- 
age), and the chosen samples were limited to peak interglacial conditions to avoid 
potential upwelling influences during glacial periods*'. The extent of the modern 
air-sea CO, disequilibrium at each location is displayed in Extended Data Fig. la. 
Analytical methodology. Between 90 and 200 individuals of G. ruber (~10 pg 
per shell) were picked from the 300-355-1m size fraction from ODP Sites 999 and 
662. Foraminiferal samples were crushed between cleaned glass microscope slides 
and subsequently cleaned according to established oxidative cleaning methods. 
After cleaning, samples were dissolved in ~0.15 M Teflon-distilled HNOs, centri- 
fuged and transferred to 5 ml Teflon vials for storage. An aliquot (~20 pl; ~7% of 
the total sample) was taken for trace element analysis. Boron was separated from 
the dissolved samples using Amberlite IRA-743 boron-specific anion exchange 
resin following established procedures”. Boron isotope ratios were measured ona 
Thermo Scientific Neptune multicollector inductively coupled plasma mass spec- 
trometer (MC-ICPMS) at the University of Southampton according to methods 
described elsewhere'*?°**. 

External reproducibility of 5'’B analyses is calculated following the approach 
of ref. 54, and is described by the relationship: 


20= 1.87exp~ 20-61" + 0.22exp~ 0-431") (1) 


where [''B] is the intensity of the lp signal in volts (see ref. 18 for further details). 

Trace elements were measured ona Thermo Scientific Element 2 single collector 
ICPMS at the University of Southampton, following established methods”. Over 
the period 2012-2013, analytical reproducibility for Mg/Ca was +2.7% (20). Raw 
Mg/Ca ratios were corrected for changes in the Mg/Ca ratio of seawater (Mg/Ca) w> 
using the approach of ref. 55 and the power-law modification of ref. 56 and the 
modelled (Mg/Ca).w of ref. 57. Specifically, we use an H value” of 0.41, originally 
derived for G. sacculifer*’, because no species-specific H value is currently available 
for G. ruber (for extended discussion, see ref. 48). The following equation**” was 
therefore used to derive calcification temperatures (in units of °C) from our Mg/Ca 
ratios, which also includes a depth correction to account for the influence of dis- 
solution on shell Mg/Ca ratios. 
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where (=) is the Mg/Ca ratio of seawater at the time t of interest, (=) is 
a sw test 


the Mg/Ca of the foraminiferal test, Z is the core depth in kilometres and E is 
defined by the following equation”*: 


Trace element data were also used to check the efficiency of the foraminiferal 
cleaning procedure”, All samples had Al/Ca ratios of <100 pmol mol” !, and 
typically <60 pmol mol’. 

Determination of pH from 5"'B of G. ruber. Boron in seawater exists mainly as 
two different species, boric acid (B(OH);3) and the borate ion (B(OH), _), and their 
relative abundance is pH dependent. There are two isotopes of boron, '"B (~80%) 
and !°B (~20%), with a ratio normally expressed in delta notation (in per mil, %o) as: 


Up lp 
5"B ( / =. | 1.000 (3) 
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where '!B/!°Byisrosi is the isotopic ratio of the NIST SRM 951 boric acid standard 
("B/!°B = 4.04367; ref. 60). 

There is a pronounced isotopic fractionation between the two dissolved boron 
species, with boric acid being enriched in ''B by 27.2% (ref. 61). As the concentra- 
tion of each species is pH dependent, their isotopic composition also has to change 
with pH in order to maintain a constant seawater 8''B. Calibration studies**° 
have shown that the borate species is predominantly incorporated into foraminiferal 
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CaCOs, and therefore ocean pH can be calculated from the 5''B ofborate (8! Brorate) 
as follows: 
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where pK; is the dissociation constant for boric acid at in situ temperature, salinity 
and pressure™, 3Bey is the isotopic composition of seawater (39.61%bo; ref. 65), 
8! Borate is the isotopic composition of borate ion, and ILIOK is the isotopic frac- 
tionation between the two aqueous species of boron in seawater (1.0272 + 0.0006) 
(ref. 61). 

In our calculations, the temperature for ODP Site 999 is derived from Mg/Ca 
ratios measured on aliquots (separated after dissolution) of the same samples as 
those used for 8''B analysis and for ODP Site 662 from published records of tem- 
perature using the U's; proxy®. Despite the uncertainty in Mg/Ca-derived SSTs 
we have not used published Us; temperature records for ODP Site 999 because 
they are of lower temporal resolution and close to saturation (T = 28-29 °C). 
Salinity has little influence on the calculations of pH (+1 psu = +0.006 pH units), 
and therefore is assumed to be constant at 35 psu (similar to the present-day mean 
annual average at both locations). The uncertainty associated with this assumption 
is propagated into poo,” calculations. 

Boron has a long residence time in seawater (10-20 Myr; ref. 67), and to account 
for likely (small) changes in the boron isotopic composition of seawater (5''B,w) 
over the last 3 Myr, we use a simple linear extrapolation between modern 8''Byy 
(39.61%o; ref. 65) and the 5''B,y determined by ref. 68 for the middle Miocene 
(12.72 Myr ago; 5''B,y = 37.8%o). This simple estimation yields 5''B,, = 39.2%o 
at 3 Myr ago, which is consistent with available independent constraints, for exam- 
ple those based on assumptions of bottom-water pH and measured benthic fora- 
miniferal 5'!B (ref. 69). 

Finally, to calculate pH from the 5"'B of G. ruber, it is necessary to account for 
species-specific differences between 5’'Byorate in ambient seawater and 8"'B in 
foraminiferal calcite (8''Beaicite; that is, ‘vital effects’). Here we used the species- 
and size-specific calibration equation of ref. 18 for G. ruber in the size range 300- 
355 |um (see equation (5)). This equation has been applied in previous studies'* to 
produce a 8''B-based pco,*"” record for the last 30 kyr that is in very good agree- 
ment with ice-core pco,*"” records (Extended Data Fig. 2). 


5" Boorate = (5" Beatcite — 8-87 + 1.52)/0.60 +0.08 (uncertainty at 20) (5) 


It is important to note that not only is there generally good preservation of the 
sites we use”**, but also the ''B of G. ruber does not appear to be greatly affected 
by partial dissolution”’. 
Determination of poo," from 5''B-derived pH. Another variable of the ocean 
carbonate system is required besides pH to calculate the partial pressure of CO; in 
seawater, Pco,*” (ref. 70). Here, total alkalinity (TA) is assumed to be constant at 
values similar to modern values at ODP Site 999 (2,330 mol kg~ ref. 20). It is impor- 
tant to note that pco, *” estimates are mostly determined by the reconstructed pH 
and that TA has little influence. This is because pH reflects the ratio of TA to DIC 
(total dissolved inorganic carbon), so when pH is known the ratio of TA:DIC is set, 
so the effect on pco, ” of a large increase/decrease in TA is partially countered by 
an opposite change in DIC. Indeed, at a given pH, a 10% change in TA results in a 
Poo,” change of only 10%. For example, modifying TA by +100 pmolkg”' (a 
range equivalent to modelled variations in TA for the last 2 Myr; ref. 30) modifies 
reconstructed poo,” (when pH is known) by less than +12 patm. 

Pco,*” was calculated using the equations of ref. 70, the “seacarb” package of 
R (statistical software, see ref. 71) and a Monte Carlo approach ( = 10,000) to 
fully propagate the uncertainty in the input parameters (at 95% confidence or full 
range, where appropriate): 5''B (+analytical uncertainty, calculated using equa- 
tion (1), and the calibration uncertainty in equation (5)), the Mg/Ca-derived temper- 
ature (+3 °C), the salinity (+3 psu), TA (£175 pmolkg” 1), and 8!'B,,, (+0.4%o). 
Pco,*"™ was then calculated from pco,*” using Henry’s Law and subtracting the 
modern disequilibria with respect to CO) at the two sites (Extended Data Fig. 1; 
Supplementary Tables 1 and 2). Note that for the quoted uncertainty range for 
temperature, salinity, and 5 Baw a normal distribution is assumed. However, for 
TA we have assumed a ‘flat’ probability (that is, an equal probability of TA being 
any value in the range 2,155-2,505 mol kg"). We therefore do not ascribe weight 
to the assumption that TA remains constant, but rather fully explore the likely range 
given the available, model-based, constraints’*”*. It should also be noted that salinity 
and temperature have little effect on our estimated pco,*” (+1 psu = +0.2 pratm; 
+1°C = +8 patm). 
Comparison with published records of Pliocene poo, atm Figure 1 and Extended 
Data Fig. 2b, c show a comparison of our high-resolution 5''B-derived pco,*"” record 
with published records. As noted in the main text, although the various approaches 
agree, in detail our record exhibits more structure. As a consequence, cross plots of 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


the previously published CO, data against AMAT (or SST,¢) are largely incoherent 
(Extended Data Fig. 2d-f). In the case of the stomatal estimates” and the existing 
5''B-based records”*”°, this is mainly a consequence of their low temporal resolu- 
tion, although analytical issues”* and species choice (we use G. ruber, which spends 
its entire life cycle in the mixed layer, whereas ref. 23 uses G. sacculifer, which migrates 
during its life cycle and whose 5"'B, unlike that of G. ruber, is modified by partial 
dissolution”’) may also havea role in the discrepancy with earlier 8''B records (see 
ref. 25 for further discussion). The lack of variability through the Pliocene for the 
alkenone-based records may be related to changes in the size of the alkenone 
producers”, fluctuations in nutrient content/water depth of maximum alkenone 
production, and/or variations in the degree of passive versus active uptake of CO, 
by the alkenone-producing coccolithophorids*”. 

Continuous records of Plio-Pleistocene global temperature change. Robust 
records of global temperature change are needed to determine how Earth’s climate 
has responded to changes in CO). Here we estimate this variable using two inde- 
pendent approaches: (1) we generate a stack of available sea surface temperature 
records (SST,,); and (2) following ref. 35 we use a reconstruction of global mean 
annual surface air temperature change based on a scaling of the Northern Hemi- 
sphere temperature required by a simple coupled ice-sheet-climate model to pre- 
dict the benthic 5'°O stack of ref. 76 (tuned here to the LR04 age model; AMAT). 

For the SST stack we imposed a number of criteria for site selection. These are: 
(1) the record must be continuous from the late Pliocene to the late Pleistocene (or 
nearly so); (2) the temporal resolution must be relatively high (ideally better than 
one sample per 10 kyr; for ODP Site 1237 we have, however, accepted a lower reso- 
lution to increase spatial coverage) to allow us to fully resolve the dominant orbital- 
scale variability; (3) the SST record must be based on Us, given that Mg/Ca suffers 
an unacceptable level of uncertainty on these timescales owing to the secular evo- 
lution of the Mg/Ca ratio of seawater (for example, ref. 48); and (4) the tempera- 
tures recorded by the Us proxy must be less than 29 °C, above which the proxy 
becomes saturated and therefore unresponsive”. Ten published records meet these 
criteria (ODP Sites 982, 607, 1012, 1082, 1239, 846, 662, 722, 1237 and 1090; refs 66, 
77-84) and the locations of these sites are shown in Extended Data Fig. 2b. The 
average temporal resolution of these records is one sample about every 4 kyr (rang- 
ing from ~2 kyr to ~13 kyr) and the published age model of each site is either part 
of the LR04 stack or was tuned to it (see the original publications for details). 

To stack the records, each was first converted to a relative SST record referenced 
to either the average of the Holocene (0-10 kyr), or mean annual modern SST if the 
Holocene is missing, and then linearly interpolated to a 5-kyr spacing. These rela- 
tive records are then averaged to produce a single stacked record of relative SST 
change (SST,,; Supplementary Table 5). The number of sites contributing to the 
SST stack varies but for most of the record is =8 (Extended Data Fig. 6a, b). Uncer- 
tainty on the SST stack is estimated by a Monte Carlo procedure where 1,000 real- 
izations are made of each individual SST record with noise added reflecting the 
magnitude of analytical uncertainty in the Us SST reconstruction (+1 °C at 20; 
ref. 92). Since we are using the same proxy for each location it is not necessary to 
consider the calibration uncertainty, as this should be the same for each record. 
Each SST realization is then averaged to produce 1,000 realizations of the SST stack. 
The mean of these 1,000 realizations is then calculated and the 95% confidence 
interval is given by the 2.5% and 97.5% percentile (red band on Fig. 3). Jacknifing of 
the SST stack (that is, the sequential removal of one record at a time) indicates that 
no particular record has undue influence and the SST stack remains close to the 
bounds relating to analytical uncertainty (the grey lines on Extended Data Fig. 6c, d). 

Our aim with the SST stack was not specifically to reconstruct global SST change 
but rather to examine the change in SST at these locations for a given forcing in the 
Pliocene and Pleistocene. We therefore do not require the SST stack to reflect 
global SST change. However, in order to assess how well the SST stack does reflect 
global SST we: (1) Examined the mean of historic SST change (1870 AD to 2013 AD; 
from the HadISST database; ref. 85) at each location where we have an alkenone 
palaeo-SST record. This comparison is shown in Extended Data Fig. 7 (blue circles). 
Despite exhibiting more variability than the mean annual global average (red in 
Extended Data Fig. 7), these ten sites clearly capture the global long-term trend in 
global mean SST***” over the last 140 years or so (Extended Data Fig. 7). (2) Compared 
the SST stack to a multi-proxy and more comprehensive and independent com- 
pilation of ref. 32 that covers the last 100 kyr with >30 sites and the last 278 kyr 
with >10 sites. When data for the last 278 kyr are stacked together in a similar way 
to the SST stack, the stack of ref. 32 (blue on Fig. 3c) compares well with SST. 
giving us confidence that it closely reflects global SST change. (3) Compared the 
SST stack to discrete global reconstructions of SST. For the last glacial (20-25 kyr), 
the SST stack gives a ASST of —2.2 + 0.4K, which is close to the ASST of —3.2 K 
from a recent comprehensive compilation for the Last Glacial Maximum“ and is 
within uncertainty of earlier reconstructions (for example, ref. 91 where ASST of 
—1.9 + 1.8K). For the Mid-Pliocene Warm Period (3-3.3 Myr ago), our SST stack 
gives an average of +2.3 K. A simple mean calculated from the larger multi-proxy 


PRISM SST compilation of ref. 40 is very similar at +2.6K. The SST stack is 
slightly warmer than an area-weighted mean of the PRISM SST set (+2 K; ref. 40). 
Taken together, these comparisons clearly indicate that, although SST. is made 
of a limited number of sites, it does appear to closely reflect change in global SST. 
This conclusion is also supported by the general agreement between the trends 
(but not absolute values) exhibited by AMAT and the SST stack through the Plio- 
cene and Pleistocene (Fig. 3), with subtle differences between these two climate 
records (for example, at 2.8 Myr ago) potentially a result of a decoupling between 
deep- and surface-water temperature evolution, small spatial biases in our SST stack, 
and/or minor age-model inaccuracies (the conversion of depth below seafloor ina 
marine sediment core to age). 
Regression-based determinations of climate sensitivity. To examine the climatic 
response (expressed as either AMAT or ASST) to forcing by CO, and land-ice albedo 
changes in both time periods, we used a linear regression approach. Because each 
variable used (CO, and sea level, AMAT or ASST) has an associated uncertainty, 
however, it is necessary to fully explore the influence of these uncertainties on our 
estimates of slope determined using least-squares linear regression. Owing to the 
difficulty of performing the least-squares linear regression with uncertainty in x 
and y variables that are not necessarily normally distributed, we have used a two- 
stage approach to fully propagate all the uncertainties involved. First, we generated 
1,000 realizations of each temporal record of each variable (for example, AFco2, 
AFco2p AMAT or ASST) based on a random sampling of each record within its 
uncertainty envelope. This uncertainty envelope was either a simple normal dis- 
tribution (for example, +6 parts per million for ice-core CO2) or based on other 
Monte Carlo output (for example, random sampling of the 10,000 simulations of 
the Pliocene 3''B-pco, at record or the 1,000 realizations of the SST stack; see above). 
Then the first realization of the AFco2 (or AFco2,11) record was regressed against 
the first realization of the AMAT (or ASST) with the uncertainty in the slope and 
intercept of that regression determined using a bootstrapping approach (n = 1,000; 
ref, 88). The second realization of the forcing term and the climate response was 
then regressed and the 1,000 estimates of slope and intercept by bootstrapping were 
combined with 1,000 of the first regression. This continued for all 1,000 realizations 
and a probability density function for the slope and intercept, accounting for x 
and y uncertainty, was then constructed from the combined bootstrap estimates for 
each realization (n = 1,000,000). The results of this approach are shown in Fig. 5. 

As noted above, pco,*"™ (and hence AFcop) calculated from boron isotopes is a 
function of not only the measured 5''B but also the total alkalinity (TA; or other 
second carbonate system variable) and, beyond the last million years or so, the boron 
isotopic composition of seawater (5''B,,,). This is illustrated in Extended Data Fig. 8. 
Here peo,” is calculated from an artificial 8B and temperature record (Extended 
Data Fig. 8a), a TA of either 2,000 mol kg~ 1 2,300 pmol kg ~ 1 or 2,600 pmol kg~ 1 
a 5! Boy of 38.8%o, 39.6%o (that is, modern) or 40.4%o (Extended Data Fig. 8) and 
the assumption that pco,“"" = pco,*”. These parameter choices result in a large 
difference in absolute CO, but, although they are extreme and perhaps unlikely for 
the Pliocene, the slope ofa linear regression of global temperature change and AFco2 
is very similar for each set of parameters (Extended Data Fig. 8c, d). So much so, 
that even with only a poor knowledge of 8 Bay (for example, +0.8%o) and TA (for 
example, +300 jumol kg” ') the accuracy of the relationship between reconstructed 
AFco2 and temperature is not unduly affected. 

The residence time of boron in seawater (10-20 Myr) ensures that changes in 
5''B.y across the time interval examined here (1 Myr) are unlikely to be large (<0.1%o; 
ref. 67) and so uncertainty in the absolute value of 5''B, and any changes across 
the study interval can be ignored for our determinations of S?. In all the previous 
calculations we assume that TA is randomly distributed between 2,155 mol kg~ : 
and 2,505 pmol kg ~ | therefore accounting for all possible trends in TA across the 
time interval studied within this range. However, to better examine the influence 
ofa large secular shift in TA on our estimates of S° we have imposed a 200 mol kg * 
decrease (TAg) or increase (TA;) across our Pliocene study interval. The slope for 
the regressions using one parameter set (VDW11 and sea level values from ref. 46 
recalculated by ref. 12) but with such a varying TA are shown in Extended Data 
Fig. 8e, f. Even this relatively large secular change does not have a major influence 
on the estimated slope, clearly illustrating that our assumptions regarding TA, both 
its absolute value and its secular evolution, have little influence on our calculated 
AFco2 and hence our conclusions. 

Pliocene poo, *™ variability. The apparent cyclicity in our Pliocene CO, record 
can be investigated using spectral analysis. Extended Data Fig. 4c shows that the 
evolutive power spectra for the Pliocene pco,*"” and a ~100-kyr cycle is clearly 
dominant. Our sampling resolution is one sample per ~13 kyr, which is not suf- 
ficient to resolve cycles of a precessional length (for example, 19 kyr and 23 kyr) 
but may be adequate to resolve obliquity (~41-kyr length), yet these cycles are 
apparently absent in the generated spectra (Extended Data Fig. 4c). To ensure our 
resolution is not biasing this result we have sampled the LR04 benthic 8'%O stack at 
our exact sampling resolution and examined the evolutive power spectra of this 
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sampled record (Extended Data Fig. 4d). This analysis reveals the presence of 100- 
kyr and 41-kyr cycles in the 5'°O data, despite our relatively low resolution, sup- 
porting the observation that the dominant cycle in Pliocene pco,*"” is ~100 kyr. 

The magnitude of Pliocene poo," variability, shown in Extended Data Fig. 4a, 
is similar to that exhibited by published late and mid-Pleistocene 5''B-pco,*"™ 
records (green and red lines on Extended Data Fig. 4a) and by the Late Pleistocene 
ice-core data when noise that is approximately equivalent to our 5''B-pco,*"™ 
uncertainty is added (+35 jratm; black dashed line on Extended Data Fig. 4a). In 
contrast, the '80 variability for these time intervals increases markedly from the 
Pliocene to late Pleistocene as the magnitude of glacial—interglacial cycles increases 
(Fig. le, Extended Data Fig. 4b). 
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with the mean annual modern Apco, from the reconstruction of ref.21.b, Map _ constructed and data visualized in Ocean Data View”. 
of the sites (and labelled with their depths) used to generate the SST stack with 
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Extended Data Figure 2 | Comparisons of boron-isotope-based Pco, ra 
estimates with other methodologies and archives. a, Estimates of pco,""" 
from published 5''B records compared to ice-core CO, (red line; refs 27-29). 
The dotted line is for pco, = 278 atm. In a the data of ref. 20 (blue circles) 
have been recalculated in the same manner as described here for the Pliocene, 
including using the G. ruber 8''B-pH calibration of ref. 18. The error band 
encompasses 68% (dark blue) and 95% (light blue) of 10,000 Monte Carlo 
simulations of poo, *"™ (see main text). Also shown are the G. sacculifer-based 
5"'B-pco,*™ record of ref. 30 (green circles). In this case error bars (25 patm) 
are as determined in that study. Despite similar analytical uncertainty, the 
smaller error bars for the ref. 30 data result from these authors not propagating 
the 5''B-pH calibration uncertainty and considering a smaller range in 
temperature, salinity and alkalinity uncertainty than in this study (+0.76 °C, 
+1 psu, +27 umol kg? versus +3 °C, +3 practical salinity units (psu), 

+175 umol kg! with a flat probability in this study). b, 8''B-based peo," 
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record generated here (blue closed circles and 95% and 68% uncertainty bands) 
with pco,*"” from the 5'°C of alkenones from published studies. See Fig. 1 
legend for details. c, 5''B-based pco,*"™ record generated here (blue closed 
circles and 95% and 68% uncertainty bands) with pco,*"™ from previous 
5''B-based studies and from plant stomata. See Fig. 1 legend for details. 

d-f, Comparison of cross plots of CO, forcing and AMAT for our high- 
resolution 5''B-CO, record (d), published alkenone CO, data (e) and 
published low-resolution 5''B-CO, data (f). In each panel the slopes of 
regression lines fitted through the data are labelled (+1 standard error, se). 
In d ice-core CO, data are shown as red open circles and Pliocene 8'’B-CO, as 
open blue circles. In e and f, ice-core CO2 data are shown in grey for clarity. 
In e, alkenone CO); data are from the following sources: ODP 1208 (orange’®), 
ODP 806 (purple'®); ODP 925 (brown*’); ODP 999 (green circles”; green 
squares”*). In c 5''B-CO, data are from ODP 999 (blue”’ and red?’). 
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Extended Data Figure 3 | Probability density functions for equivalently variability is high and/or equivalent age samples are absent, we show 

aged samples from ODP Site 662 and ODP Site 999. Each panel, labelled with neighbouring samples from ODP Site 999 (for example, bottom left and right). 
age (in units of kyr ago), shows the probability density function for a given This comparison indicates that although the mean pco,*"” of ODP 662 tends to 
estimate of pco,*"” from ODP Site 662 (red) and ODP Site 999 (blue). In most _ be higher than ODP 999, there is always a high degree of overlap between 
instances equal age samples are compared, but in some cases either where the estimates from the two sites. 
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Extended Data Figure 4 | Probability density functions of pco,“"" and 
benthic 5'°O and time series analysis. a, Probability density functions of the 
residuals of 5''B-pco,"" about the long-term trend for the late Pliocene (this 
study; blue line), the mid-Pleistocene*’ (green line) and late Pleistocene’?”° 
(red line). Dashed vertical lines show the upper and lower limit (labelled) 
encompassing 90% of the data. The residual of the ice-core CO, record””” 
about the long-term mean for 0-0.8 Myr ago plus a random noise equivalent to 
+35 patm (the typical 5''B-CO, uncertainty) is shown as a black dashed 
probability density function. b, Probability density functions of the residual of 
LR04 benthic 8!°O from the long-term trend for the late Pleistocene (red), 
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mid-Pleistocene (green) and late Pliocene (blue). Dashed vertical lines show the 
upper and lower limit (labelled) encompassing 90% of the data. In contrast to 
pco,*™, 8'80 clearly exhibits an increase in variability over the last 3.3 Myr. 
c, d, Evolutive power spectral analyses of Pliocene pco,*"™ (c) and resampled 
5'°O (ref. 22) (d). The evolutive power spectra was computed using the 

fast Fourier transform of overlapping segments with a 300-kyr moving window. 
Before spectral analysis, all series were notch-filtered to remove the long-term 
trend (bandwidth = 0.005), and interpolated to 12-kyr intervals (the real 
resolution of our record is ~13.5 kyr). 
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Extended Data Figure 5 | Summary of sea-level records used to calculate 5'%O from the Red Sea“ for the interval 0-520 kyr and the paired Mg/Ca 
AF,;. Inaandb the red curve is from ref. 13 (R14) based on the planktic 5480 —_and benthic 5'8O from the deep South Pacific for the interval 520-800 kyr 
from the Mediterranean Sea and the methods developed for the Red Sea by (ref. 45) (RO9+ E12). The green curve in b is based ona scaling of the LRO4 580 
ref. 93. We have removed those intervals identified as possible sapropel stack to indicators of sea level from sequence stratigraphy (ref. 46 recalculated 
(organic-matter-rich sediments) events and linearly interpolated across gapsin _ by ref. 12). In each the uncertainty in the reconstruction at 95% confidence 
the original record. The black curve is the sea-level record from an inversion of __ is shown by an appropriately coloured error band. Marine isotope stages 

the benthic oxygen isotope record of ref. 76 (tuned to LR04 here) using an mentioned in text are labelled. RSL, relative sea-level change (in metres), 

ice sheet model** (VDW11). The blue curve in a is based on the planktic/bulk __ relative to the modern value. 
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Extended Data Figure 6 | Stacked sea surface temperature record. removing one record at a time; grey lines show maximum and minimum). Note 
a, b, Number of records that contribute to the SST stack through time. that the jacknifing illustrates that no single record has an undue influence on 
c, d, Uncertainty in the SST stack due to analytical uncertainty (at 95% the SST stack. 


confidence; red band) and showing the influence of jacknifing (that is, 
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Extended Data Figure 7 | Comparison of global SST from the HadSST3 data 
set with SST HadISST1 from ODP sites. a, Historic global mean annual sea 
surface temperature anomaly from the HadSST3 data set***” (red circles) 

and mean SST at locations above the ODP sites that make up the SST stack from 
HadISST1 (blue; local SST). Thick red and blue lines are non-parametric 
smoothers through both data sets. b, Cross plot of global mean annual SST 
and local SST. The regression line determined using linear regression has a 
slope of ~1 and intercept of close to 0, so local SST captures the global trend 
well. The shaded blue band in b represents the 95% confidence interval of the 
regression line. 
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the regressions, which are very similar regardless of parameter choice, are 
colour-coded and listed in the bottom right-hand corner of ¢ and d. 

e, f, Probability density function of slope for regressions of Pliocene-aged 
AMAT against AFco> (e) and AFeo2,11 (f), where TA is decreasing by 

200 mol kg! (dashed) and increasing by 200 pmol kg” | (dotted). Note that 
despite large variations in TA the slope of the regressions do not change greatly. 


Extended Data Figure 8 | The influence of TA and 5''B,,, on determinations 
of SP using linear regression. a, b, Artificial 3!!B record (where 5''B foram 
is the boron isotopic composition of an artificial foraminifera; a) and 
temperature record (b). ¢, d, Cross plot and regressions of 5! B-AFoo2 and 
global temperature for TA dramatically varying in the range 2,000- 

2,600 pmol kg! (TA; c) and 8! Byy from 38.8%o to 40.4%o (d). The slopes of 
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Biocontainment of genetically modified 
organisms by synthetic protein design 


Daniel J. Mandell!*, Marc J. Lajoiel*, Michael T. Mee!?, Ryo Takeuchi*, Gleb Kuznetsov’, Julie E. Norville', Christopher J. Gregg", 


Barry L. Stoddard* & George M. Church® 


Genetically modified organisms (GMOs) are increasingly deployed at large scales and in open environments. Genetic 
biocontainment strategies are needed to prevent unintended proliferation of GMOs in natural ecosystems. Existing 
biocontainment methods are insufficient because they impose evolutionary pressure on the organism to eject the 
safeguard by spontaneous mutagenesis or horizontal gene transfer, or because they can be circumvented by envi- 
ronmentally available compounds. Here we computationally redesign essential enzymes in the first organism pos- 
sessing an altered genetic code (Escherichia coli strain C321.AA) to confer metabolic dependence on non-standard 
amino acids for survival. The resulting GMOs cannot metabolically bypass their biocontainment mechanisms using 
known environmental compounds, and they exhibit unprecedented resistance to evolutionary escape through 
mutagenesis and horizontal gene transfer. This work provides a foundation for safer GMOs that are isolated from 


natural ecosystems by a reliance on synthetic metabolites. 


GMOsare rapidly being deployed for large-scale use in bioremediation, 
agriculture, bioenergy and therapeutics’. In order to protect natural 
ecosystems and address public concern it is critical that the scientific 
community implements robust biocontainment mechanisms to prevent 
unintended proliferation of GMOs. Current strategies rely on integ- 
rating toxin/antitoxin ‘kill switches”, establishing auxotrophies for 
essential compounds’, or both**. Toxin/antitoxin systems suffer from 
selective pressure to improve fitness through deactivation of the toxic 
product”*, while metabolic auxotrophies can be circumvented by scav- 
enging essential metabolites from nearby decayed cells or cross-feeding 
from established ecological niches. Effective biocontainment strategies 
must protect against three possible escape mechanisms: mutagenic drift, 
environmental supplementation and horizontal gene transfer (HGT). 
Here we introduce ‘synthetic auxotrophy’ for non-natural compounds 
as a means to biological containment that is robust against all three 
mechanisms. Using the first genomically recoded organism (GRO)’ we 
assigned the UAG stop codon to incorporate a non-standard amino acid 
(NSAA) and computationally redesigned the cores of essential enzymes 
to require the NSAA for proper translation, folding and function. X-ray 
crystallography of a redesigned enzyme shows atomic-level agreement 
with the predicted structure. Combining multiple redesigned enzymes 
resulted in GROs that exhibit markedly reduced escape frequencies 
and readily succumb to competition by unmodified organisms in non- 
permissive conditions. Whole-genome sequencing of viable escapees 
revealed escape mutations in a redesigned enzyme and also disruption 
of cellular protein degradation machinery. Accordingly, reducing the 
activity of the NSAA aminoacyl-tRNA synthetase in non-permissive con- 
ditions produced double- and triple-enzyme synthetic auxotrophs with 
undetectable escape when monitored for 14 days (detection limit 2.2 x 
10° '* escapees per colony forming unit (c.f.u.)). We additionally show 
that while bacterial lysate supports the growth of common metabolic 
auxotrophs, the environmental absence of NSAAs prevents such natural 
products from sustaining synthetic auxotrophs. Furthermore, distribu- 
ting redesigned enzymes throughout the genome reduces susceptibility 


to HGT. When our GROs incorporate sufficient foreign DNA to over- 
write the NSAA-dependent enzymes, they also revert UAG function, 
thereby preserving biocontainment by deactivating recoded genes. The 
general strategy developed here provides a critical advance in biocon- 
tainment as GMOs are considered for broader deployment in open 
environments. 


Computational design of synthetic auxotrophs 


We focused on the NSAA L-4,4’-biphenylalanine (bipA), which has a 
size and geometry unlike any standard amino acid, and a hydrophobic 
chemistry expected to be compatible with protein cores. We introduced 
a plasmid containing a codon-optimized version of the bipA aminoacyl- 
tRNA synthetase (bipARS)/tRNApipa system" into a GRO (genomically 
recoded E. coli strain C321.AA (ref. 9)), thereby assigning UAG as a 
dedicated codon for bipA incorporation. Using a model of bipA in the 
Rosetta software for macromolecular modelling” we applied our com- 
putational second-site suppressor design protocol to 13,564 core posi- 
tions in 112 essential proteins” with X-ray structures (Methods). We 
refined designs for cores that tightly pack bipA while maximizing neigh- 
bouring compensatory mutations predicted to destabilize the proteins 
in the presence of standard amino acid suppressors at UAG positions 
(Fig. 1a). We further required that candidate enzymes produce products 
that cannot be supplemented by environmentally available compounds. 
For example, we rejected glmS designs because glucosamine supple- 
mentation rescues growth of gimS mutants'*. We selected designs of six 
essential genes for experimental characterization: adenylate kinase (adk), 
alanyl-tRNA synthetase (alaS), DNA polymerase III subunit delta (holB), 
methionyl-tRNA synthetase (metG), phosphoglycerate kinase (pgk) and 
tyrosyl-tRNA synthetase (tyrS). For all cases we designed oligonucleo- 
tides (Supplementary Table 1) encoding small libraries suggested by 
the computational models (Supplementary Table 2) and used them to 
directly edit the target essential gene in C321.AA using co-selection mul- 
tiplex automated genome engineering (CoS-MAGE)"™. Since tyrS fea- 
tured the greatest number of compensatory mutations, we additionally 
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Figure 1 | Computational design of NSAA-dependent essential proteins. 

a, Overview of the computational second-site suppressor strategy. 

b, Computational design of a NSAA-dependent tyrosyl-tRNA synthetase 
(purple) overlaid on the wild-type structure (green; PDB code 2YXN). Six 
substituted residues are shown in stick representation. c, X-ray crystallography 
of the redesigned synthetase with an electron density map (2F, — F. contoured 
at 1.0c) for substituted residues; substitution F236A is on a disordered loop 
and is not observed. d, The crystal structure of the redesigned enzyme (cyan) 
superimposed onto the computationally predicted model (purple). 


synthesized eight computational tyrS designs and used them to replace 
the endogenous tyrS gene (Supplementary Table 3). We screened our 
CoS-MAGE populations for bipA-dependent clones by replica plating 
from permissive media (containing bipA and arabinose for bipARS in- 
duction) to non-permissive media (lacking bipA and arabinose) and 
validated candidates by monitoring kinetic growth in the presence and 
absence of bipA (Methods and Extended Data Fig. 1). Mass spectrometry 
confirmed the specific incorporation of bipA in redesigned enzymes 
(Methods and Extended Data Fig. 2). X-ray crystallography of a rede- 
signed enzyme at 2.65 A resolution (Protein Data Bank (PDB)! code 
40UD, Extended Data Table 1) shows atomic-level agreement with 
computational predictions (Fig. 1b-d, Extended Data Fig. 3 and Sup- 
plementary Discussion). Selectivity for bipA in a redesigned core was 
further confirmed by measuring soluble protein content when bipA is 
mutated to leucine (wild-type residue) or tryptophan (most similar nat- 
ural residue to bipA by mass) (Methods and Extended Data Fig. 4). 


Characterization of synthetic auxotrophs 


We characterized the escape frequencies of eight strains by plating on 
non-permissive media with and without bipARS inducer arabinose 
(Fig. 2, Supplementary Tables 4, 5 and Methods). Escapees exhibiting 
varying fitness were detected by the emergence of colonies in the absence 
of bipA. Two tyrS variants (tyrS.d6 and tyrS.d7) and two adk variants 
(adk.d4 and adk.d6) showed robust growth in permissive conditions 
and low escape frequencies in the absence of bipA. Strain alaS.d5 showed 
only minor impairment in the absence of bipA, suggesting that near- 
cognate suppression of the UAG codon by endogenous tRNA or mis- 
charging of natural amino acids by BipARS is adequate to support 
growth. Consistent with this hypothesis, inserting a UAG immediately 
after the start codon (strain alaS.d5.startUAG) further impairs growth 
in the absence of bipA, although bipA dependence is readily overcome 
by mutational escape. HolB recombinants presented only the designed 
bipA mutation (holB.d1) and none of the compensatory mutations, 
suggesting that the intended compensation may be too destabilizing, 
or that the native amino acids at those positions may be critical for 
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Figure 2 | Escape frequencies and doubling times of auxotrophic strains. 
Escape frequencies are shown for engineered auxotrophic strains calculated as 
colonies observed per c.f.u. plated over three technical replicates on solid media 
lacking arabinose and bipA. Assay limit is calculated as 1/(total c.f.u. plated) 
for the most conservative detection limit of a cohort, with a single-enzyme 
auxotroph limit of 3.5 X 10°? escapees per c.f.u., a double-enzyme auxotroph 
limit of 8.3 X 107"! escapees per c.f.u. and a triple-enzyme auxotroph limit 
of 6.41 X 10°"! escapees per c.f.u. Positive error bars represent the s.e.m. of the 
escape frequency over three technical replicates (Methods). The top panel 
presents the doubling times for each strain in the presence of 10 LM or 100 hM 
bipA, with the parental strain doubling times represented by the dashed 
horizontal lines. No marker indicates undetectable growth. Positive and 
negative error bars represent the s.e.m. 


function. The lack of compensation for bipA results in a strong and con- 
tinuous selective pressure to incorporate standard amino acids at the 
bipA position, so holB.d1 was not carried forward. 

We hypothesized that since the designed proteins have structurally 
distinct cores, each variant may favour different standard amino acids 
at the bipA position. Therefore, viable UAG suppressors for one en- 
zyme may be deleterious for another. We sought to determine the dis- 
tribution of standard amino acids accommodated at the UAG position 
in each variant to identify combinations of redesigned enzymes that could 
drive escape frequencies even lower. We cultured the top seven depen- 
dent strains in permissive media and used MAGE" to introduce all 64 
codons at the UAG positions (Methods). For tyrS.d7 we collaterally in- 
troduced the V307A mutation observed in tyrS.d6, since the same oli- 
gonucleotide containing V307A was used to encode degeneracy for both 
strains at the UAG position, producing the eighth strain tyrS.d8. Imme- 
diately following electroporation, cells were shifted to non-permissive 
media so that recombinants with canonical amino acids replacing bipA 
would overtake the population according to their relative fitness. 

We sampled the eight populations at 1-h and 4-h time points, at con- 
fluence, and at two subsequent passages to confluence (100-fold dilu- 
tion in each passage), by which point the preferred genotypes emerged. 
Using next-generation sequencing we determined the relative abundance 
of all standard amino acid codons at the UAG positions for each time 
point (Extended Data Fig. 5 and Supplementary Table 6) and computed 
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the ‘flatness’ (Shannon entropy”) of each amino acid frequency distri- 
bution (Fig. 3). The two strains showing the greatest escape frequencies, 
alaS.d5 and metG.d3, also have the flattest amino acid frequency dis- 
tributions. Correspondingly, the strains with the lowest escape fre- 
quencies exhibit peaked amino acid frequency distributions. These amino 
acid preference profiles show a strong relationship between structural 
selectivity for bipA and escape frequency, supporting the rationale un- 
derlying our computational strategy. Furthermore, they confirm our 
hypothesis that different redesigned protein cores will favour different 
standard amino acids. In particular, phenylalanine and tryptophan 
(aromatics) dominate tyrS.d7 and tyrS.d8 populations, whereas the other 
recombinants tend towards valine, leucine, isoleucine and methionine 
(aliphatics) (Fig. 3a). In agreement with these observations, we were 
able to isolate viable recombinants of adk.d6 containing leucine but not 
tryptophan at the bipA position, while also isolating viable recombi- 
nants of tyrS.d8 containing tryptophan but not leucine at the bipA 
position (Supplementary Table 7). In considering candidates for com- 
bination, we omitted alaS and metG due to their susceptibility for near- 
cognate suppression. We also determined that pgk mutants can grow 
robustly in the presence of pyruvate and/or succinate (Extended Data 
Fig. 6) even though they do not grow in lysogeny broth Lennox (LB")”. 
Since these carbon sources are environmentally available, pgk violates 
our definition of essentiality and we removed pgk.d4 from considera- 
tion. Finally, we excluded adk.d4 due to its poor survival at stationary 
phase (Supplementary Table 8, Supplementary Discussion). We there- 
fore focused on combinations of tyrS.d6, tyrS.d7 and tyrS.d8 with adk.d6, 
all of which maintain robust growth in permissive media, show strong 
dependence for bipA, and are metabolically isolated from environ- 
mental compounds. 

Combining tyrS designs with adk.d6 yielded three strains with no 
detectable escapees after 24 h, including adk.d6_tyrS.d8, which has un- 
detectable growth after >72 h (detection limit 7.44 X 107 '! escapees per 
c.fu., Fig. 2 and Supplementary Table 4). Colonies bearing the adk.d6_ 
tyrS.d8 genotype were observed between 4 and 7 days of incubation, 
but showed severely impaired fitness when grown in non-permissive 
liquid culture and were readily outcompeted by prototrophic E. coli 
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Figure 3 | Structural specificity at designed UAG positions in eight NSAA- 
dependent enzymes correlates with escape frequencies. a, Amino acid 
preferences at UAG positions in eight synthetic auxotrophs were determined 
by replacing the UAG codon with full NNN degeneracy and then sequencing 
the resulting populations with an Illumina MiSeq. Frequencies of each 

amino acid as a fraction of total sequences observed after three 1:100 passages to 
confluence are shown (top 11 most frequent amino acids only). Samples are 
clustered by Euclidean distance between amino acid frequencies. The frequency 
of an amino acid reports on the fitness conferred by the corresponding natural 
amino acid suppressor at the UAG position relative to all other amino acids. 
b, Shannon entropy was computed over the distributions of amino acids 
preferred at the UAG positions of the eight single-enzyme auxotrophs and 
plotted against the 48-h escape frequency for each strain. Entropy correlates 
log-linearly with escape frequency, suggesting that enzyme cores with high 
structural specificity for bipA at the UAG position have fewer fit evolutionary 
routes to escape. Strains alaS.d5 and metG.d3 have a deactivated mutS gene. 
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(Fig. 4 and Supplementary Discussion). The relative reductions in escape 
frequencies support the hypothesis that combining variants with dis- 
tinct amino acid preferences at the UAG position decreases fitness of 
escapees. Even though tyrS.d6 and tyrS.d8 exhibit similar escape fre- 
quencies as single-enzyme auxotrophs, strain adk.d6_tyrS.d6 produces 
faster growing escapees (adk.d6 and tyrS.dé6 share a preference for leu- 
cine) than strain adk.d6_tyrS.d8 (tyrS.d8 prefers tryptophan). Although 
tyrS.d7 and tyrS.d8 both prefer aromatic residues, tyrS.d7 exhibits a 
broader amino acid preference profile (Fig. 3a) and produces faster- 
growing escapees than tyrS.d8 (Fig. 2). Accordingly, adk.d6_tyrS.d8 yields 
the lowest escape frequency of the combined tyrS and adk variants. 


Prevention of mutagenic escape 


The appearance of escapee colonies from adk.d6_tyrS.d8 after >72 h 
suggests the emergence of rare genotypes conferring weak viability (dou- 
bling time =348 min) in the absence of bipA. To uncover mutagenic 
routes to escape we performed whole-genome sequencing on escapees 
of adk.d6, tyrS.d8 and adk.d6_tyrS.d8 (Methods, four escapees for each 
single-enzyme auxotroph and three escapees for the double-enzyme 
auxotroph were sequenced). We observed no mutations in the ribo- 
some or tRNAs that could account for UAG translation in the absence 
of bipA, nor did we observe mutations to any designed amino acid 
positions. However, we identified a point mutation (A70V) to tyrS in 
all four tyrS.d8 escapees sequenced (Supplementary Table 9). The A70V 
mutation may improve packing of the tyrS.d8 catalytic domain in the 
context ofa destabilized neighbouring helical bundle lacking bipA (Ex- 
tended Data Fig. 7a). To validate this escape mechanism we produced 
strain tyrS.d8.A70V and performed an escape assay on non-permissive 
media. Within 5 days of plating, we observed colony formation from all 
plated cells (Extended Data Fig. 7b), confirming that A70V is an escape 
mechanism for tyrS.d8. The A70V mutant of tyrS.d8 does not impair 
fitness in permissive conditions (Supplementary Table 10), so the geno- 
type spontaneously arises as a neutral mutation within the fitness land- 
scape by replication errors. However, targeted sequencing of the tyrS 
gene in eight additional tyrS.d8 escapees did not reveal the A70V mu- 
tation, suggesting that A70V is not the only escape mechanism for 
tyrS.d8. 

Whole-genome sequencing of adk.d6 and adk.d6_tyrS.d8 escapees 
revealed disruptive mutations to Lon protease in all seven cases. One 
clone contained a frame shift and another contained a non-synonymous 
substitution (L611P) within the Jon gene. The remaining five cases had 
a transposable element inserted within the promoter of Jon. Targeted 
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Figure 4 | Competition between synthetic auxotroph escapees and 
prototrophic E. coli. C321.AA was competed in the absence of bipA against 
escapees from a single-enzyme bipA auxotroph (pgk.d4, moderate NSAA 
dependence), or from a double-enzyme bipA auxotroph (adk.d6_tyrS.d8, 
strong bipA-dependence). Populations were seeded with 100-fold excess 
escapees and grown for 8h in non-permissive conditions. The populations 
were evaluated using flow cytometry for episomally expressed fluorescent 
proteins at t= 0 and t= 8h. Results from separate competition experiments 
against three different escapees are shown for each synthetic auxotroph. 

a, pgk.d4 escapees continue to expand in a mixed population with C321.AA 
after 8h. b, adk.d6_tyrS.d8 escapees are rapidly outcompeted by C321.AA, 
which overtakes the population after 8h. 
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sequencing characterized the insertion sequence in at least one clone 
as IS186, exactly recapitulating the Lon protease deficiency of E. coli 
BL21'*. We validated Lon disruption as an escape mechanism using A 
Red-mediated recombination to replace lon with a kanamycin resist- 
ance gene (kan") in adk.d6, tyrS.d8 and adk.d6_tyrS.d8. Recombinants 
were replica plated from permissive to non-permissive media containing 
kanamycin. Colony PCR confirmed that 27 of 27 non-bipA-dependent 
colonies screened (9 escapees per dependent strain) had Lon deleted 
by kan®, 

Since the Lon protease is the primary apparatus for bulk degrada- 
tion of misfolded proteins in the E. coli cytoplasm", we hypothesized 
that its disruption would allow the persistence of poorly folded adk.d6é 
and tyrS.d8 proteins when standard amino acids are incorporated in 
place of bipA. We further hypothesized that basal UAG suppression 
from the promiscuous activity of pEVOL-BipARS produced sufficient 
full-length protein to support viability in the absence of Lon-mediated 
degradation. To test this hypothesis and safeguard against Lon-mediated 
escape we pursued two independent strategies to reduce the activity of 
BipARS in non-permissive conditions. First, we reduced the gene copy 
number approximately tenfold by moving bipARS and tRNAgipa from 
the p15A pEVOL plasmid to the genome of adk.d6, producing the strain 
adk.d6_int (Methods). Second, we applied our computational second- 
site suppressor strategy to residue V291 in bipARS (homologous to 
L303bipA in our tyrS designs) and reintroduced it into adk.d6 on the 
pEVOL vector, producing strain adk.d6_bipARS.d7 (Methods). This 
latter strategy produced a BipARS variant that requires bipA for fold- 
ing and function, thereby abrogating residual activity towards standard 
amino acids in the absence of bipA. Both strategies resulted in a >200- 
fold reduction in 7-day escape frequency (Supplementary Table 4). In- 
troducing tyrS.d8 to these strains produced double- and triple-enzyme 
synthetic auxotrophs adk.d6_tyrS.d8_int and adk.d6_tyrS.d8_bipARS.d7 
that exhibited undetectable escape when monitored for 14 days (Fig. 2 
and Supplementary Table 4, detection limit 2.2 x 10°’? escapees per 
c.fu.). Both strains also showed undetectable escape in the presence of 
arabinose (Supplementary Table 5), and presented no fitness impair- 
ment relative to the parental adk.d6_tyrS.d8 synthetic auxotroph (Sup- 
plementary Table 4, doubling times of 57 and 55 min). 


Protection from natural supplementation 


To compare synthetic auxotrophy to current biocontainment practices 
we generated natural metabolic auxotrophs by knocking out asd and 
thyA genes from an MG1655-derived E. coli strain (ECNR1). The asd 
knockout renders the strain dependent on diaminopimelic acid (DAP) 
for cell-wall biosynthesis’, while the thyA knockout deprives the cell of 
thymine, an essential nucleobase”. These well studied auxotrophies are 
commonly incorporated into biocontainment strategies*®. In agreement 
with previous studies*®, the asd knockout shows strong dependence on 
its requisite metabolite, with a 7-day escape frequency of 8.97 X 10° 
escapees per c.fu. (Supplementary Table 11). Knocking out thyA from 
this strain to produce a double-enzyme auxotroph did not reduce the 
7-day escape frequency (8.79 X 10°” escapees per c.f.u.). Nevertheless, 
metabolic strategies could complement synthetic auxotrophies to im- 
prove escape frequencies in defined ecological niches. To test this prin- 
ciple we knocked out asd from the double-enzyme synthetic auxotrophs 
of adk and tyrS resulting in three triple-enzyme auxotrophs (adk.d6_ 
tyrS.d6_asd, adk.d6_tyrS.d7_asd and adk.d6_tyrS.d8_asd) that grew 
robustly in permissive conditions but showed undetectable escape after 
7 days on media lacking bipA and DAP (Fig. 2 and Supplementary 
Tables 4 and 5, detection limit 6.4 X 107’ escapees per c.f.u.). 

While bacterial growth assays are often carried out in variations of 
media enriched with yeast extract, GMOs are increasingly deployed 
among a diversity of ecosystems that may provide opportunities for 
scavenging or cross-feeding essential metabolites. To compare meta- 
bolic and synthetic auxotroph strategies in an environment mimicking 
endogenous bacterial communities we grew engineered variants of 
both natural and synthetic auxotrophs in LB’ containing E. coli lysate 
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(Methods). We hypothesized that since DAP is an essential component 
of the bacterial cell wall, the Aasd strains may scavenge sufficient DAP 
from E. coli lysate to complement the auxotrophy. As anticipated, meta- 
bolic auxotrophs obtained sufficient nutrients from the yeast/tryptone 
(LB") and the bacterial remnants (lysate) to support exponential growth 
(Extended Data Fig. 6e-h), while the synthetic auxotrophs failed to 
circumvent their dependencies. These results highlight the importance 
of establishing auxotrophies for compounds that are not environmen- 
tally available, and of ensuring the metabolic essentiality of enzymes 
intended to confer dependence. 


Resistance to horizontal gene transfer 


HGT is an important mechanism of evolution in any genetically rich 
environment’’. We developed a conjugation escape assay to assess how 
DNA transfer within an ecosystem enables a GMO to escape biocon- 
tainment. Whereas any recombination event that replaces an inacti- 
vated gene could overcome metabolic auxotrophies”, we hypothesized 
that conjugal escape would be disfavoured in GROs because donor DNA 
replacing bipA-dependent genes would also overwrite crucial genetic 
elements involved in genetic code reassignment (Fig. 5a). For example, 
reintroducing UAG stop codons into essential genes without restoring 
release-factor-1-mediated translational termination could be deleteri- 
ous’ or lethal’. Furthermore, reintroducing release factor 1 would result 
in competition between bipA incorporation and translational termina- 
tion, undermining the recoded functions of the GRO. 

To simulate a worst-case scenario in ecosystems containing a rich 
source of conjugal donors, we used Tn5 transposition to integrate an 
origin of transfer (oriT) into a population of E. coli MG1655 conjugal 
donor strains. We isolated a population of ~450 independent clones 
(one oriT for every ~10 kilobase portion of the 4.6 megabase (Mb) 
genome) and sequenced the flanking genomic regions of 96 donor 
colonies to confirm that oriT integration was well distributed through- 
out the population. We then conjugated this donor population into our 
auxotrophic strains ata ratio of 1 donor to 100 recipients to increase the 
probability that conjugal transfer will initiate from one oriT position 
per recipient. Conjugation was performed for durations of 50 min and 
12h (average conjugation times predicted to transfer 0.5 and 7.2 genomes) 
to simulate a single conjugal interaction and an ecological worst-case 
scenario, respectively. Conjugal escapees were selected on non-permissive 
media, and 23 alleles distributed throughout the genome (Fig. 5a) were 
screened using multiplex allele-specific colony PCR (mascPCR) to assess 
how much of the recoded genome is replaced by wild-type donor DNA. 

Conjugal escape frequency decreases as the number of auxotrophic 
gene variants increases (Fig. 5b, top panel and Extended Data Fig. 8), 
consistent with larger portions of the genome that must be overwritten 
for conjugal escape of the multi-enzyme auxotrophs (Fig. 5b, bottom 
panel). The 12-h conjugations effect higher escape frequencies than do 
the 50-min conjugations, and the 12-h conjugations produce a larger 
diversity of conjugal escape genotypes, consistent with an increased op- 
portunity to initiate new conjugal transfers during the mating period. 
Encouragingly, all 50-min conjugal escapees from multi-enzyme aux- 
otrophs exhibit the wild-type donor sequence at all 23 assayed alleles 
(Fig. 5b, bottom panel and Supplementary Table 12), resulting in the 
reintroduction of release factor 1 and its UAG-mediated translational 
termination function. This collateral replacement of recombinant geno- 
mic DNA could be extended to other recombinant payloads such as 
toxins, antibiotic resistance genes, catabolic and genome editing en- 
zymes, and orthogonal aminoacyl-tRNA synthetase/tRNA pairs used 
for NSAA incorporation. 


Discussion 


Effective biocontainment mechanisms for GMOs should place high 
barriers between modified organisms and the natural environment. Our 
NSAA design strategy produces organisms with an altered chemical 
language that isolates them from natural ecosystems. By conferring de- 
pendence on synthetic metabolites at the level of protein translation, 
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folding and function, synthetic auxotrophy addresses the need for GMOs 
that are refractory to mutational escape, metabolic supplementation and 
HGT. Because our NSAAs are incorporated into essential enzymes with 


ARTICLE 


Figure 5 | Synthetic auxotrophy and genomic recoding reduce HGT- 
mediated escape. a, The positions of key alleles are plotted to scale on the 
genome schematic. Red lines indicate auxotrophies used in the multi-enzyme 
auxotrophs and grey lines indicate other auxotrophies that were not included 
in this assay. Asterisks indicate important alleles associated with the 
reassignment of UAG translation function (blue are essential genes and green 
are potentially important genes’). Conjugation-mediated reversion of the 
UAA codons back to the wild-type UAG is expected to be deleterious unless the 
natural UAG translational termination function is reverted. R1 and R2 denote 
replicores 1 and 2, respectively. b, Combining multiple synthetic auxotrophies 
in a single genome requires a large portion of the genome to be overwritten 
by wild-type donor DNA, reducing the frequency of conjugal escape (top panel) 
and increasing the likelihood of overwriting the portions of the genome 
(bottom panel) that provide expanded biological function (for example, prfA 
encodes release factor 1, which mediates translational termination at UAG 
codons). Positive error bars indicate standard deviation. 


second-site mutations, escapees are rare and are unfit to out-compete 
prototrophic microbial communities. In part, robustness emerges from 
simplicity: our most escape-resistant synthetic auxotrophs contain only 
32 (adk.d6_tyrS.d8_int) and 49 (adk.d6_tyrS.d8_bipARS.d7) base pair 
substitutions across the 4.6 Mb parental genome and bipARS, with no 
essential genes deleted or toxic products added. Furthermore, NSAA- 
based biocontainment with bipA only modestly increases the cost per 
litre of proliferating culture (Extended Data Table 2). 

This work highlights the delicate balance required to engineer es- 
sential proteins that are conditionally stabilized by a single NSAA. The 
design must confer sufficient instability in non-permissive conditions 
to deactivate the protein, while providing functional stability in the pres- 
ence of the correct NSAA. Future design strategies could include polar 
or charged NSAAs to engineer hydrogen bonds requiring exquisite 
geometric specificity** for folding, enzyme-substrate interactions, or 
macromolecular associations. This approach may reduce susceptibility 
to suppressors, although fewer protein microenvironments may accom- 
modate the burial of charged or polar residues. Reassigning additional 
codons would permit the incorporation of multiple NSAAs that confer 
dependence either in different structural motifs or in participation of a 
joint chemistry. Eventually, organisms with orthogonal genomic che- 
mistries including expanded genetic alphabets” and their associated 
replication machinery could provide additional layers of isolation”®. 

Our results demonstrate that mutational escape frequency under lab- 
oratory growth conditions is a necessary but insufficient metric to eval- 
uate biocontainment strategies. Many genes considered to be essential 
have functions that can be complemented by environmental compounds, 
as demonstrated here for auxotrophies of natural (asd, thy) and designed 
(pgk.d4) enzymes. Furthermore, localizing biocontainment mechan- 
isms to a small portion of the genome increases susceptibility to escape 
by uptake of foreign DNA. Distributing multiple NSAA-dependent 
enzymes throughout a recoded genome acts as a genomic safeguard 
against escape by HGT, and demands that conjugal escape replaces 
large portions of the recipient genome. This collateral replacement of 
GMO genomic DNA could be exploited to delete recombinant pay- 
loads upon exposure to conjugal donors in the environment. Addition- 
ally, by recoding restricted payloads with essential UAG codons, they 
can be prevented from functioning in natural organisms. Therefore, the 
expanded genetic code of GROs can be exploited both to prevent their 
undesired survival in natural ecosystems and to block incoming and 
outgoing HGT with natural organisms. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Essential protein selection and buried residue determination. Candidate genes 
were selected by searching the Keio collection’ of comprehensive single-gene 
E. coli K-12 knockouts for genes classified as essential. X-ray structures were iden- 
tified by mapping essential gene GenBank” protein genInfo identifiers (PIDs) to 
PDB” entries through the UniProtKB”. In cases of multiple PDB entries the 
highest-resolution structure was selected. 112 high-resolution X-ray structures (re- 
solution =2.8 A) were analysed. Structures were pre-processed to remove alterna- 
tive side-chain conformers (the first listed conformer was kept), to remove atoms 
without occupancy, to remove heteroatoms, to convert selenomethionines to methi- 
onines, and to remove chains other than the first listed chain of the essential protein. 
The solvent-accessible surface area (SASA) of each position in each candidate struc- 
ture was calculated using the PyRosetta” interface to the Rosetta SasaCalculator 
class with a 1.0 A probe radius (a radius smaller than 1.4 A allowed finer sampling 
of spaces around candidate positions). Positions were considered buried if their 
SASA was not more than 20% of the residue-specific average SASA value from a 
30-member random ensemble of Gly-X-Gly peptides, where X is the residue type, 
as determined by the GETAREA method***'. The average SASA values are Ala, 
64.9; Arg, 195.5; Asn, 114.3; Asp, 113.0; Cys, 102.3; Gln, 143.7; Glu, 141.2; His, 
154.6; Ile, 147.3; Gly, 87.2; Leu, 146.2; Lys, 164.5; Met, 158.3; Phe, 180.1; Pro, 105.2; 
Ser, 77.4; Thr, 106.2; Trp, 224.6; Tyr, 193.1; Val, 122.3. By these criteria 13,564 
residues in the data set were considered buried. 

Design and refinement of NSAA-dependent proteins. The side chains of each 
structure were relaxed into local minima of the Rosetta forcefield by the Rosetta 
sidechain_min application (Rosetta command lines, below). Three separate design 
simulations were then carried out for each target buried position using Rosetta- 
Design*’. The first simulation sets the target position to L-4,4'-biphenylalanine 
(bipA), as implemented in Rosetta (residue type B30)"', and sets the surrounding 
residues to either redesign (varies both amino acid identity and side-chain con- 
formation) or repack (varies only the conformation of the wild-type amino acid). 
Residues with Cox atoms within 6 A of the target position, or with Cx atoms within 
8 A of the target position and Cf atoms closer than the Co. atom to the target posi- 
tion, were set to redesign. Residues with Co, atoms within 10 A of the target posi- 
tion, or with Co atoms within 12 A of the target position and CB atoms closer than 
the Cx atom to the target position, were set to repack. All other side chains were 
fixed at their minimized coordinates, together with all backbone atoms. The result- 
ing energy terms were appended with the target position SASA as calculated by the 
PyRosetta SasaCalculator with a 1.0 A probe radius. We term the Rosetta scores of 
these designs ‘compensated scores’. In the second simulation, the same calculation 
is performed, except all positions previously set to redesign are restricted only to 
repack. We term the resulting scores ‘uncompensated scores’. The difference be- 
tween the ‘compensated score’ and the ‘uncompensated score’ reports on the extent 
to which the target site must change to accommodate bipA. In the third simulation, 
the target position maintains its wild-type identity, all coordinates are fixed at the 
positions output by the sidechain_min application, and the structure is rescored 
using the same scoring parameters as the other two simulations (Rosetta command 
lines, below). We term the resulting scores ‘wild-type scores’. The difference between 
the compensated score and the wild-type score reports on the predicted stability of 
the redesigned core relative to the wild-type structure. 

The design goal was to obtain variants that are functionally stable with bipA at 
the designed position, while being maximally destabilized with a natural amino acid 
at the bipA position. Accordingly, designs were filtered for the following criteria: 

(1) The minimized wild-type score must be less than 10 Rosetta energy units to 
ensure that the starting structure can be reasonably modelled with the Rosetta 
forcefield. 

(2) The compensated score must be less than or equal to the wild-type score, to 
select for redesigned cores that do not destabilize the protein relative to the wild- 
type sequence. 

(3) The uncompensated score must be greater than the wild-type score, to en- 
sure that compensatory mutations are necessary to accommodate bipA. 

(4) The compensated score must be less than the uncompensated score, to select 
for compensatory mutations that improve the stability of the core in the presence 
of bipA relative to the uncompensated mutant. This requirement also selects for 
sequences that reduce the fitness of suppressors at the compensatory positions. 

(5) The SASA score must be <0.75, to select for cores that tightly pack around 
bipA, both to select for stability in the presence of bipA and to reduce the fitness of 
standard amino acids at the bipA or compensatory positions. 

The designs for 16 engineered UAG sites in 12 enzymes meeting these criteria 
were then ranked by the difference between compensated score and uncompensated 
score, as a key metric for bipA dependence, and were further filtered by the following 
criteria based on known structural and functional data from the literature: 

(1) The redesigned residues must not participate in ligand binding, catalysis or 
be required for allosteric signal transduction via conformational rearrangements. 
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(2) The product of the reaction must not be environmentally available. 

(3) The product of the reaction must not be complementable by another envi- 
ronmentally available molecule. 

Using these criteria 10 designs were subject to refinement. Positions to design, 
repack, or revert to wild-type were selected by visual inspection. A second round 
of fixed backbone design was then applied to generate 100 designs from each un- 
refined structure (Rosetta command lines, below). Designs from six enzymes were 
carried forward for experimental characterization. Frequently occurring mutations 
in the refined designs assessed by visual inspection were included in MAGE oli- 
gonucleotides (Supplementary Table 1). For tyrS, eight additional all-atom designs 
were encoded by PCR primers for gene assembly (Supplementary Table 3). 
Rosetta command lines. All Rosetta calculations were performed with Rosetta 
version 48561. 

Example command line for preparative side-chain minimization of scaffold 
structures: 

sidechain_min.linuxgecrelease -database __ROSETTA_DATABASE__ -loops:: 
input_pdb __PREFIX__.pdb -output_tag __PREFIX__ -ex1 -ex2 -overwrite 
Example command line for wild-type score: 

score.linuxgccrelease -database __ROSETTA_DATABASE__ -l *.pdb -score: 
hbond_His_Phil_fix -in:file:fullatom -no_optH -no_his_his_pairE -score:weights 
mm_std 

Example command line for initial design for compensated score and uncompen- 
sated score: 

fixbb.linuxgcecrelease -database _.ROSETTA_DATABASE__ -ex1 -ex2 -s __ 
PREFIX__.pdb -resfile __PREFIX__.resfile -minimize_sidechains -score:weights 
mm_std -score::hbond_His_Phil_fix -no_his_his_pairE -nstruct 1 -out:pdb_gz - 
overwrite 

Example command line for refinement of dependent designs: 
fixbb.linuxgccrelease -database ___ROSETTA_DATABASE__-s 2YXNA_min.pdb 
-minimize_sidechains -score::hbond_His_Phil_fix -no_his_his_pairE -ex1 -ex2 
-nstruct 100 -resfile___PREFIX__.resfile -overwrite 

Culture and selection conditions. Growth media consisted of LB" (10 1” ' bacto 
tryptone, 5 g1~' sodium chloride, 5 g1~' yeast extract). Permissive growth media 
for bipA-dependent auxotrophs was LB" supplemented with sodium dodecyl sul- 
fate (SDS), chloramphenicol, bipA and arabinose. Non-permissive media lacked 
bipA and arabinose. The following selective agents, nutrients and inducers were 
used when indicated: chloramphenicol (20 pg ml’ '), kanamycin (30 pg ml '), 
spectinomycin (95 pg ml ae tetracycline (12 ug ml~ 1), zeocin (10 ug ml Ay gen- 
tamycin (5 ig ml 1), SDS (0.005% w/v), vancomycin (64 Lig ml), colicin El (ColE1; 
~10pg ml '), DAP (75 ugml—*), thymidine (100 pg ml’), bipA (10 1M), glucose 
(0.2% w/v), pyruvate (0.2% w/v), succinate (0.2% w/v), arabinose (0.2% w/v), anhy- 
drotetracycline (30 ng pl 1) For strains adk.d6_tyrS.d8_bipARS.d7 and adk.d6_ 
tyrS.d8_int permissive media contained 100 LM bipA. Permissive media for meta- 
bolic auxotrophs is LB“ supplemented with 75 pgml~' DAP and 100pgml' 
thymidine. TolC selections (SDS) and counter selections (colicin E1) were performed 
as previously described**. Tdk selections used LB“ supplemented with 20 pgm! 
2'-deoxy-5-fluorouridine and 100 jig ml deoxythymidine, and counter selec- 
tions used LB" supplemented with 5 1M azidothymidine. 

Strain engineering. Two strategies were undertaken to engineer redesigned essen- 
tial proteins. Strains adk.d4, adk.d6, alaS.d5, holB.d1, metG.d3 and pgk.d4 were 
generated by performing CoS-MAGE" with designed single stranded oligonucle- 
otide pools (Supplementary Table 1) and tolC co-selection’*. Recombined popula- 
tions were plated on permissive media, and then replica plated on non-permissive 
media to screen for bipA-dependent clones. Top candidates were identified by ki- 
netic growth monitoring (Biotek H1 or H4 plate reader) of 10-40 bipA-dependent 
clones in permissive and non-permissive liquid growth media. Strains showing 
robust growth in permissive media and little to no growth in non-permissive media 
were carried forward. The tyrS.d6, tyrS.d7 and tyrS.d8 gene variants were con- 
structed by PCR amplification of the E. coli MG1655 tyrS gene with mutagenic 
primers, followed by full-length Gibson assembly” (Supplementary Tables 1 and 3) 
and recombination onto the genome using A Red recombineering**”*. Strains 
tyrS.d6, tyrS.d7, tyrS.d8, adk.d6_tyrS.d6, adk.d6_tyrS.d7 and adk.d6_tyrS.d8 were 
produced by (1) deleting the endogenous tdk gene from C321.AA, (2) replacing the 
endogenous adk and tyrS genes with their codon-shuffled variants (adk(recode)- 
tdk and tyrS(recode)-tdk, Supplementary Table 3) transcriptionally fused to tdk, 
and (3) replacing the fusion cassettes with the adk.d6, tyrS.d6, tyrS.d7, or tyrS.d8 
variants. Variants of adk.d6, tyrS.d7 and tyrS.d8 containing leucine and tryptophan 
at the bipA position were constructed by MAGE with oligonucleotides containing 
the appropriate mutations and clonal populations were isolated on LB” plates lack- 
ing bipA and arabinose. Triple-enzyme auxotrophs were created by replacing asd 
with a Aasd::spec® cassette. We reactivated mismatch repair using mutS_null_ 
revert-2* in the pgk, adk and tyrS single-enzyme auxotrophs and all of the multi- 
enzyme auxotrophs. For construction of the quadruplet tRNApipa (Supplementary 
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Discussion) QuikChange was used to replace the CUA anticodon with UCUA. 
Quadruplet versions of adk.d6 and tyrS.d8 with UAGA at the bipA positions were 
constructed by PCR and Gibson assembly followed by 1 Red-mediated recom- 
bination into the genome as described above. All genotypes (Supplementary Table 3) 
were confirmed using mascPCR” and Sanger sequencing using primers from Sup- 
plementary Table 1. 

Strain doubling time analysis. Strain doubling times were calculated as previously 
described”. Briefly, cultures were grown in flat-bottom 96-well plates (150 jl LB", 
34 °C, 300r.p.m.). Kinetic growth (OD¢o9) was monitored on a Biotek H1 plate 
reader at 5-min intervals. Doubling times were calculated by tgouble = At X In(2)/ 
m, where At = 5 min per time point and m is the maximum slope of In(OD¢o9) 
calculated from the linear regression through a sliding window of 5 contiguous time 
points (20 min intervals). For escapee strains exhibiting growth rates that were too 
slow for this analysis, doubling times were calculated by taoubie = At X In(2)/In(P2/ 
P,), where At represents sliding windows of 15 min and P,/P, represents initial/ 
final OD¢0oo values for the window. Strains that exhibited doubling times greater 
than 900 min and/or maximum OD6goq values less than 0.2 after the specified culture 
duration were considered to exhibit no growth (‘none observed’) for the given con- 
ditions. Improved aeration doubling times for strains adk.d6_tyrS.d8_int and adk. 
d6_tyrS.d8_bipARS.d7 were obtained by growing strains in 3 ml LB" in 28 ml cul- 
ture tubes (three tubes dedicated for each time point), measuring OD¢oo of three 
technical replicates in 1 cm cuvettes in a spectrophotometer (Beckman DU640) at 
20 min intervals for 3.66 h, and determining the slope of the log transformed data 
over each 40 min window. Least doubling time (20 min X In(2)/slope) and cor- 
responding R* values are reported (Supplementary Table 4). 

Expression and purification of tyrS.d7. C321.AA cells were grown to mid-log 
phase in LB’ and co-electrotransformed with 5 ng each of plasmid pEVOL-bipA”” 
and an additional plasmid containing full-length tyrS.d7 as an amino-terminal 
GST fusion under an anhydrotetracycline (aTc)-inducible promoter. After 90 min 
of recovery cells were plated on LB" agar supplemented with chloramphenicol and 
kanamycin. Single colonies were used to inoculate 2 ml starter cultures of LB" sup- 
plemented with chloramphenicol and kanamycin that were grown overnight at 
34 °C. Saturated overnight growths were diluted 1:100 into six 1] cultures contain- 
ing LB” supplemented with chloramphenicol and kanamycin, which were grown 
at 34°C with shaking at 250 r.p.m. to an ODgo9 of 0.6. The temperature was then 
reduced to 18 °C, and bipA was added to a final concentration of 500 |.M. After an 
additional 60-90 min aTc and arabinose were added to final concentrations of 
30ng ml! and 0.2%, respectively. After 24 h of expression, cells were harvested by 
centrifugation at 10,000g and snap frozen in a dry ice and ethanol bath. Approxi- 
mately 10 g of thawed cell pellet was suspended in 100 ml of Buffer A (20 mM Tris- 
HCl (pH 7.2), 500 mM NaCl, and 5% (v/v) glycerol) supplemented with 1 mg ml! 
lysozyme. After sonication (6 cycles of 30 s each), the cell lysate was centrifuged at 
20,000g for 20 min at 4 °C, and the supernatant was mixed with 5 ml of polyethy- 
leneimine (pH 7.9) on ice. After centrifugation again at 20,000g for 10 min at 4 °C, 
the supernatant was filtered through a 0.45 zm PVDF membrane, and suspended 
with 3 ml of glutathione sepharose 4B beads (GE Healthcare Life Sciences). The 
beads were extensively washed with Buffer A supplemented with 1 mM dithiothrei- 
tol (DTT) and incubated with 120 units of PreScission protease (GE Healthcare Life 
Sciences) at 5°C for 16h. The untagged protein was eluted and dialysed against 
Buffer B (20 mM Tris-HCl (pH 7.5), 50 mM NaCl, 10 mM 2-mercaptoethanol and 
5% (v/v) glycerol) at 4°C. The redesigned enzyme was concentrated to approxi- 
mately 5.5mg ml. 

Determination of the tyrS.d7 crystallographic structure. One-microlitre drops 
of protein were mixed with an equal volume of a reservoir solution containing 
0.1 M sodium malonate (pH 5.5) and 18% (w/v) polyethylene glycol 3350. Crys- 
tals were grown at room temperature via hanging drop vapour phase diffusion, 
and then were transferred into 0.1 M sodium malonate (pH 5.5), 25% (w/v) poly- 
ethylene glycol 3350 and 15% ethylene glycol. Crystals were frozen in liquid nitro- 
gen, and diffraction data were collected at the Advanced Light Source (ALS) beamline 
5.0.1. The data were processed using the HKL2000 package**. The crystal structure 
of a truncated E. coli TyrS variant protein (PDB code 2YXN) was used as a search 
model in molecular replacement. The crystallographic model was built using COOT”, 
refined using REFMACS and Crystallography and NMR system (CNS)*”, and de- 
posited in the RCSB Protein Data Bank (PDB code 40UD). Statistics of the data 
collection and refinement are provided in Extended Data Table 1. 

Mass spectrometry of NSAA-dependent enzymes. Strains adk.d6, tyrS.d7 and 
tyrS.d8 were grown to mid-log phase in 10 ml of permissive media. Cell pellets 
were obtained and soluble lysate fractions were collected as above. Samples were 
normalized to 250 jug (adk.d6) or 50 pg (tyrS.d7 and tyrS.d8) total protein content 
and resolved by SDS-PAGE. Gel slices from each strain containing the enzyme 
(resolved by size comparison to a known standard) were digested with trypsin. 
Peptide sequence analysis of each digestion mixture was performed by micro- 
capillary reversed-phase high-performance liquid chromatography coupled with 


nanoelectrospray tandem mass spectrometry (j1LC-MS/MS) on a LTQ-Orbitrap 
Elite mass spectrometer (ThermoFisher Scientific, San Jose, CA). The Orbitrap 
repetitively surveyed an m/z range from 395 to 1,600, while data-dependent MS/ 
MS spectra on the 20 most abundant ions in each survey scan were acquired in the 
linear ion trap. MS/MS spectra were acquired with relative collision energy of 30%, 
2.5-Da isolation width, and recurring ions dynamically excluded for 60 s. Prelim- 
inary sequencing of peptides was facilitated with the SEQUEST algorithm with a 30 
p-p.m. mass tolerance against the Uniprot Knowledgebase E. coli K-12 reference 
proteome supplemented with a database of common laboratory contaminants, con- 
catenated to a reverse decoy database. Using a custom version of Proteomics Browser 
Suite (PBS v.2.7, ThermoFisher Scientific), peptide-spectrum matches were accepted 
with mass error <2.5 p.p.m. and score thresholds to attain an estimated false dis- 
covery rate of ~1%. 
Western blot analysis of tyrS.d7 variant GST fusions. Cell pellets for all variants 
were obtained as described above, with an expression culture volume of 10 ml. 
Cells were lysed using B-PER Bacterial Protein Extraction Reagent, lysozyme 
(100 mg ml!), DNasel (5,000 U ml), and Halt Protease Inhibitor Cocktail (all 
Thermo Scientific) according to the manufacturer’s specifications. Lysates were 
centrifuged at 15,000g for 5 min and the soluble fractions were collected. Protein 
concentration was determined fluorometrically using the Qubit Protein Assay Kit 
(Life Technologies). Lysates were normalized to 5-1g samples, resolved by SDS- 
PAGE, and electro-blotted onto PVDF membranes (Life Technologies, number 
1B24002). Western blotting was performed using an anti-GST mouse monoclonal 
primary antibody (Genscript, number A00865-40) and anti-GAPDH mouse mono- 
clonal loading control antibody (Thermo Scientific, number MA5-15738) followed 
by secondary binding to a HRP-conjugated anti-mouse antibody (Thermo Scien- 
tific, number 35080). Samples were imaged by luminol chemiluminescence on a 
ChemiDoc system (BioRad) and protein content was quantified by densitometry 
and normalized to GAPDH. 
Solid media escape assays for natural metabolic and synthetic auxotrophs. All 
strains were grown in permissive conditions and harvested in late exponential 
phase. Cells were washed twice in LB" and resuspended in LB”. Viable c.f.u. were 
calculated from the mean and standard error of the mean (s.e.m.) of three tech- 
nical replicates of tenfold serial dilutions on permissive media. Three technical 
replicates were plated on non-permissive media and monitored for 7 days. The order 
of magnitude of cells plated ranged from 10” to 10° depending on the escape 
frequency of the strain. Synthetic auxotrophs were plated on two different non- 
permissive media conditions: SC, LB" with SDS and chloramphenicol (Supplemen- 
tary Table 4); and SCA, LB’ with SDS, chloramphenicol and 0.2% arabinose 
(Supplementary Table 5). Metabolic auxotrophs were plated on LB" for non- 
permissive conditions (Supplementary Table 11). Ifsynthetic auxotrophs exhibited 
escape frequencies above the detection limit (lawns) on SC at day 1, 2 or 7 (alaS.d5, 
metG.d3, tyrS.d7), escape frequencies for those days were calculated from addi- 
tional platings at lower density. Additional platings at higher density were also 
used to obtain day 1 and day 2 escape frequencies for pgk.d4 on SC. The s.e.m. Sx 
across technical replicates of the cumulative escape frequency v scored for a given 
Syt 2 


Son\2 
day was calculated as: Sy =v (*) a (=) , where T is the mean number of 
Tt n 


c.f.u. plated, S¢t is the s.e.m. of c.f.u. plated, n is the mean cumulative colony count 
up to the given day, and Sx is the s.e.m. of the cumulative colony count up to the 
given day. If synthetic auxotroph escapees emerged on SC, three clones were iso- 
lated, their growth rates were calculated as described above, and the doubling time 
of the fastest escapee was recorded (Supplementary Table 4). 

Site saturation mutagenesis at designed UAG positions. To site-specifically re- 
place UAG with all other codons, we used MAGE oligonucleotide pools that ex- 
actly matched the sequence of the bipA-dependent gene except that the UAG was 
replaced by all 64 NNN codons (Supplementary Table 1). This allowed us to assess 
which canonical amino acid substitutions resulted in the best survival of synthetic 
auxotroph escapees. Although some of these amino acid substitutions may be un- 
likely to be evolutionarily sampled (evolution will favour amino acids with many 
tRNA gene copies and whose cognate codons are a single mutation from UAG"’), 
this unbiased strategy avoided missing mechanisms of tRNA suppression that are 
not yet characterized. Immediately after introducing NNN codon diversity via 
MAGE", we recovered the cell populations in 1 ml of LB" without supplementing 
antibiotics, arabinose, or bipA. At this point, functional proteins using bipA for pro- 
per expression, folding and function are still present in the cell, but protein turnover 
eventually replaces the bipA-dependent proteins with bipA-independent variants 
in which the UAG codon is replaced by one of the 64 codons. This in turn provides 
a strong selection for canonical amino acids that can replace bipA and maintain 
protein function. Samples of the population were taken at five time points after elec- 
trotransformation to track the population dynamics—after 1h, 100 pil of culture 
was centrifuged at 16,000g, resuspended in 20 jl distilled water (dH20), and frozen 
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at -20 °C (time point 1); 2 ml of LB" was added to the culture and then growth was 
allowed to proceed for 3 more hours before 100 pil of culture was centrifuged at 
16,000g, resuspended in 20 il dH,O, and frozen at -20 °C (time point 2); the re- 
maining culture was grown overnight to confluence after which 500 ul of culture 
was centrifuged at 16,000g, resuspended in 500 ll dH2O, and frozen at -20 °C (time 
point 3); 30 jl of confluent culture was diluted into 3 ml of fresh LB! and re-grown 
to confluence after which 500 1] of culture was centrifuged at 16,000g, resuspended 
in 500 ul dH,O, and frozen at -20 °C (time point 4); finally, 30 pil of confluent cul- 
ture was diluted into 3 ml of fresh LB” and re-grown to confluence after which 500 pl 
of culture was centrifuged at 16,000g, resuspended in 500 kl dH20, and frozen at - 
20 °C (time point 5). After sampling was complete, we had obtained five time points 
from eight strains amounting to 40 total samples. Population dynamics were ana- 
lysed by next-generation sequencing. 

Next-generation sequencing of populations with degeneracy introduced at UAG 
positions. We designed custom primers to amplify ~ 127-146 base pairs (bp) sur- 
rounding the UAG codon of each variant and to add Illumina adapters and bar- 
codes for sequencing. In order to reduce primer dimers, we redesigned the P5 
primer binding sequence (Sol-P5_alt-PCR, Supplementary Table 1). We used PCR 
to introduce Illumina sequencing primer binding sites separated from the target 
amplicon by a 4-6 bp ‘heterogeneity spacer’ that allows low diversity Illumina li- 
braries to be sequenced out of phase*” (Supplementary Table 1). We estimate that 
~10° cells (1 jl ofa confluent culture containing ~ 10” cells per ml) were assayed at 
each time point. This PCR was performed in 20 ul reactions containing 10 pl of 
KAPA HiFi HotStart ReadyMix, 9 ul of dH,O, 0.5 tl of each 20 1M primer, and 
1 ul of template cells. Thermocycling (BioRad C1000 thermocycler) involved heat 
activation at 95 °C for 3 min, followed by 30 cycles of denaturation at 98 °C for 20 s, 
annealing at 62 °C for 15s, and elongation at 72 °C for 30s with a final elongation 
for 1 min (PCR1). PCR1 products (20 il) were purified with magNA beads (40 pil)” 
and eluted in 20 pl of dH2O. A second PCR (PCR2) amplification introduced Illu- 
mina adapters tagged with a unique 6 bp barcode (on the P7 adaptor) for each 
sample and time point. The PCR2 thermocycling and purification protocols were 
identical to those of PCR1 except that the products from PCR1 were used as tem- 
plate and different primers were used. The final DNA libraries were checked on a 
1.5% w/v agarose gel and quantitated using a NanoDrop ND-1000 spectropho- 
tometer. Equimolar samples of all 40 libraries were combined in a single tube and 
sequenced using a SE50 kit on an Illumina MiSeq (Dana Farber Cancer Institute 
Molecular Biology Core Facility). The P7 and Index] reads were performed with 
standard sequencing primers, whereas the P5 read was sequenced with a custom 
primer (Sol-P5_alt-PCR, Supplementary Table 1). 

Sequencing analysis of populations with NNN degeneracy at UAG positions. 
A simple Python script was written to tally each of the 64 UAG-NNN codon 
mutations and 21 amino acid/translational stop substitutions. We discarded all 
reads that were too short to discern the NNN codon. For all other reads, a con- 
stant seed sequence was indexed within the read, and the NNN codon was located 
based on proximity to this known seed sequence. The NNN codon identity and 
translated amino acid identities were stored in dictionaries entitled ‘aas’ and ‘codons’, 
respectively. The dictionaries and code are available together at GitHub (https:// 
github.com/churchlab/NNN_sequencing_scripts). 

Shannon entropy calculations. Shannon entropy was calculated using the stand- 
ard relation H(X) = — » P(x;) log P(x;). 


Whole-genome sequencing analysis of mutagenic escapees. We performed 
whole-genome sequencing on 20 escapees and their bipA-dependent parental strains. 
Sequencing libraries were prepared according to ref. 43 and sequenced with 150-bp 
paired-end reads on an Illumina MiSeq. We used Millstone (http://churchlab. 
github.io/millstone/) to automatically call single-nucleotide variants from raw 
fastq data with respect to our starting GRO C321.AA (NCBI GenBank Accession 
CP006698.1). Thus all variant positions are reported relative to the frame of this 
genome. All variant calls are available on Github (https://github.com/churchlab/ 
dependence/tree/master/supplementary_materials). We then filtered these with 
custom scripts (https://github.com/churchlab/dependence) to identify alleles in- 
volved in hypothetical escape mechanisms: mutations in tRNAs that could lead to 
UAG suppression, mutations in translation machinery that could increase misin- 
corporation of canonical amino acids, mutations in functionally related genes that 
could functionally complement the essential gene, and mutations in chaperones or 
proteases that could stabilize poorly folded Adk and TyrS proteins. Additionally, 
for strains with adequate coverage we performed de novo assembly of unmapped 
reads to uncover structural variants not reported by Millstone. We used Velvet 
with a hash length of 21 and the following parameters for the graphing step: -cov_ 
cutoff 20 -ins_length 200 -ins_length_sd 90. We then systematically queried NCBI 
BLAST to identify each de novo sequence and biased the BLAST results to prefer 
hits against the canonical MG1655 genome so that we could later group contigs by 
position (https://github.com/churchlab/dependence/blob/master/supplementary 
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_materials/velvet_contigs_and_BLAST_data_non_permissive_8_strains.csv). Pu- 
tative Lon insertions were visually confirmed using Millstone’s JBrowse portal. 
Integration of bipARS and tRNApipa. Genomic integration of bipARS and 
tRNAbpipa was achieved in two steps by first replacing the endogenous tdk gene with 
the Parapap-inducible bipARS gene from pEVOL. Subsequently, the pEVOLtRNA 
and chloramphenicol resistance gene were inserted immediately downstream of 
Parapap-bipARS. Kapa HiFi Ready Mix was used to amplify each PCR product (see 
Supplementary Table 1 for primer sequences), and 1 Red-mediated recombina- 
tion was used to introduce the PCR products into the genome. Proper insertion of 
the desired cassettes was confirmed by PCR using tdk.seq-f and tdk.seq-r. We ob- 
served that 10 1M bipA was not adequate to support growth of adk.dé or tyrS.d8 
when bipARS was integrated into the genome; however, 100 LM bipA accommo- 
dated robust growth of adk.d6_int, tyrS.d8_int and adk.d6_tyrS.d8_int. 

Design of the bipARS.d7 bipA-dependent synthetase. We applied our compu- 
tational second-site suppressor strategy to position V290 of the bipyridylalanyl- 
tRNA synthetase X-ray structure (PDB code 2PXH, chain A). This position 
corresponds to bipA303 in our X-ray structure of tyrS.d7 when the structures 
are superimposed (alignable core backbone root mean square deviation of 3.6 A). 
We hypothesized that this position may be amenable to redesign in homologous 
structures. Six designs covering sequence variability observed in the computa- 
tional models (Supplementary Table 2) were produced by PCR amplification of 
bipARS with mutagenic primers (Supplementary Table 1) and isothermal assembly 
into the pEVOL vector, maintaining only the arabinose-inducible copy of bipARS. 
We also included the D286R mutation previously shown to increase synthetase 
activity in all constructs. Since the bipARS designs should require bipA to trans- 
late, fold and function, all derived strains were initially co-transformed with a 
nonreplicating plasmid (pJTE2, R6y origin of replication) containing a wild-type 
copy of bipARS to jumpstart production of tRNApipa- Designs were co-transformed 
with the jumpstart plasmid into C321.AA, and transformants were then trans- 
formed with a previously described GFP reporter plasmid containing a single UAG 
codon’ to measure synthetase activity by GFP fluorescence. One design (bipARS.d7; 
T/A/G/G/A/bipA) produced >5-fold bipA-dependent induction of fluorescence 
in permissive media but failed to induce any bipA-dependent fluorescence after 
passaging overnight 1:150 in non-permissive media followed by an identical pas- 
sage in permissive media. Since any functional synthetase remaining after non- 
permissive passaging should facilitate exponential production of additional synthetase, 
this behaviour suggests strong dependence of bipARS.d7 on bipA for translation 
and folding resulting in total clearance of bipARS.d7 and tRNApipa after overnight 
growth in non-permissive conditions. The bipARS.d7/tRNApipa and jumpstart 
vectors were co-transformed into C321.AA and then adk.dé and tyrS.d8 were in- 
troduced as described above. 

Growth competition assays. The assayed single- and double-enzyme synthetic 
auxotroph escapee strains (pgk.d4 esc. 1, 2 and 3; adk.d6_tyrS.d8 esc. 1, 2 and 3) 
were transformed with a pZE21 vector** bearing mCFP under aTc-inducible con- 
trol in the multiple cloning site. The parental prototrophic C321.AA strain was 
similarly transformed with an identical vector except that the fluorophore is YFP. 
Strains were grown to late-exponential phase in LB" supplemented with antibio- 
tics (SDS, chloramphenicol, kanamycin), inducers (0.1% L-arabinose, 100 ng ml! 
aTc) and bipA. Cells were washed twice in M9 salts and adjusted to a cell concen- 
tration of roughly 1 X 10” cells per ml. Biological replicates of synthetic auxotroph 
escapees were mixed with the C321.AA strain at a ratio of 100:1 and diluted to a 
seeding concentration of roughly 2.5 X 107 cells per ml in non-permissive media 
(LB’ supplemented with SDS, chloramphenicol, kanamycin and aTc). Growth 
kinetics of the competition mixture were assayed in 200 pl sample volumes on mi- 
crotitre plates incubated in a Biotek Synergy microplate reader at 34 °C. Cell mix- 
tures were fixed in PBS with 1% paraformaldehyde at time 0 and at 8 h. Fixed cells 
were run on a BD LSRFortessa cell analyser and populations were binned based on 
YFP expression level. CFP was not used for species discrimination but rather to 
maintain consistent fitness costs associated with episomal DNA maintenance and 
fluorophore expression. 

Bacterial lysate growth assays. All strains were grown up in permissive condi- 
tions and harvested in late-exponential phase. Cells were washed twice in M9 salts 
(6 gl! Na2HPOu, 3 gl? KH2PO,, 1 gl NH,Cl, 0.5 gl! NaCl) by centrifuga- 
tion at 17,900g and then diluted 100-fold into LB" supplemented with 166.66 ml 
1” trypsin-digested E. coli extract (Teknova catalogue number 3T3900). Growth 
kinetics were assayed in 200 tl sample volumes on microtitre plates as described 
above. Three biological replicates were performed by splitting a single well-mixed 
initial seeding population. 

Conjugal escape assays. The conjugal donor population was produced using the 
Epicentre EZ-Tn5 Custom Transposome kit to insert a mosaic-end-flanked kan®- 
oriT cassette into random positions of the E. coli MG1655 genome. The popula- 
tion of integrants was plated on LB" agar plates supplemented with kanamycin. 
Approximately 450 clones were lifted from the plate and pooled, which corresponds 
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to one kan®-oriT per ~10-kilobase pair region of the genome, assuming an equal 
distribution of transposition across the 4.6-megabase E. coli MG1655 genome. The 
pRK24 conjugal plasmid was conjugated” from E. coli strain 1100-2 (ref. 47) into 
the kan®-oriT donor population. The kan*-oriT insertion sites were confirmed to 
be well distributed. In brief, the donor population was sheared on a Covaris E210, 
end repaired, and ligated to Illumina adapters as described by ref. 43. Genomic se- 
quences flanking the insertion site were amplified using the Sol-P5-PCR primer 
and a series of nested primers (Supplementary Table 1) that hybridize within the 
kan® gene. PCR products corresponding to ~1 kilobase pair were gel purified from 
the smear and TOPO cloned (Invitrogen pCR-Blunt II-TOPO). Flanking genomic 
sequences were then identified by Sanger sequencing 96 TOPO clones. Conjugal 
escape assays were performed as described previously”’ with 50-min and 12-h con- 
jugal duration and a donor:auxotroph ratio of 1:100. Three technical replicates of 
two biological replicates were performed for all conjugation assays with the excep- 
tion of the double-enzyme synthetic auxotroph experiments, which were performed 
with three biological replicates (3 technical replicates each) to produce enough es- 
capees for mascPCR screening. To determine the proportion of the genome over- 
written by donor DNA the following numbers of colonies were scored for the 50-min/ 
12-h time points: adk.d6 m = 51/6; tyrS.d8 n = 44/7; adk.d6_tyrS.d8 n = 8/59; 
adk.d6_tyrS.d8_asd:specR n = 5/38. This set omits a small collection of clones 
that could not be scored due to polyclonality. 

Statistics. No statistical methods were used to predetermine sample size. 
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Extended Data Figure 1 | bipA dependence in synthetic auxotrophs. times for three technical replicates are shown. Positive and negative error bars 
Prototrophic and synthetic auxotrophic strains were grown in titrationsofbipA are s.e.m. Growth was undetectable for synthetic auxotrophs at 0.00 1M, 
and monitored in a microplate reader (Methods). Media for all bipA 0.01 LM and 0.10 1M bipA, as well as 0.50 1M bipA for adk.d6é_tyrS.d8. 


concentrations contained SDS, chloramphenicol and arabinose. Doubling 
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Extended Data Figure 2 | Mass spectrometry of NSAA-dependent enzymes. 
Mass spectrometry was performed and peptide spectrum matches (PSMs) were 
obtained as described in the Methods. Data sets were culled of minor 
contaminant PSMs and re-searched with SEQUEST against adk.d6, tyrS.d7 and 
tyrS.d8 sequences without taking into account enzyme specificity. To 
interrogate the sequences for bipA, tryptophan and leucine, the amino acid 
at the bipA position was given the mass of leucine and searches were performed 


1100 
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with differential modifications of +110.01565 and +72.99525 to account 
for the masses of bipA and tryptophan, respectively. In all samples, only 
bipA, and not leucine or tryptophan, was detected at these positions. 

The PSM for adk.d6 is shown. Peptides observed to contain bipA are 
LVEYHQMTAP(bipA]IGYVSK (adk.d6), AQYV[bipA] AEQVTR (tyrS.d7) 
and AQYV|[bipA]AEQATR (tyrS.d8). 
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Extended Data Figure 3 | Crystal structure of tyrS.d7. a, Overall structure of _ of the hyperthermophilic TyrS are shown. c, The N-terminal domain of the 
the redesigned enzyme. The N-terminal domain (residues 4-330) that catalyses engineered protein is superposed on the crystal structure of its parental enzyme 
tyrosine activation, the carboxy-terminal tRNA-binding domain (residues (green; PDB code 1X8X). The KMSKS loop of the parental enzyme is 
350-424) and their connecting region are coloured cyan, blue and yellow, highlighted in magenta. d, Tyrosine molecule bound to tyrS.d7. An electron 
respectively. The residues 232-241 are disordered (dash line). b, Comparison _ density map of L-tyrosine is shown as a grey mesh (2F, — F. contoured at 1.20; 
between the C-terminal tRNA recognition domains of tyrS.d7 (blue) and of top panel). A tyrosine and the surrounding protein fold of tyrS.d7 (cyan) are 
Thermus thermophilus TyrS (orange; PDB code 1H3E). The residues 352-442 _ very similar to those of the wild-type TyrS structure (green; bottom panel). 
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Extended Data Figure 4 | Western blot analysis of tyrS.d7 variants. Variants 
of tyrS.d7 with leucine or tryptophan at the bipA position were expressed as 
GST fusions under identical conditions and analysed by western blot 
(Methods). Soluble protein content was quantified by densitometry and 
normalized to GAPDH. Mutating bipA to leucine or tryptophan reduced 
soluble TyrS levels by 2.5- or 2.1-fold, respectively (*P < 0.05 by two-tailed 
unpaired Student’s t-test with unequal variances). Three technical replicates 
were performed; a representative image is shown. Positive error bars are s.e.m. 
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Extended Data Figure 5 | Population selection dynamics for canonical 
amino acid substitutions at designed UAG positions. For each plot, 
degenerate MAGE oligonucleotides were used to create a population of cells in 
which the UAG codon was mutated to all 64 codons. Codon substitutions 
leading to survival in the absence of bipA were selected by growth in LB“ media 
without bipA and arabinose supplementation. Aliquots of the culture 
population were taken at 1h, 4h, confluence 1 (once the culture reached 
confluence), confluence 2 (after regrowth of a 100-fold dilution of confluence 1) 


and confluence 3 (after regrowth of a 100-fold dilution of confluence 2). The 
amino acid identity at the bipA position was probed by targeted Illumina 
sequencing. Residual bipA-containing proteins were expected to remain active 
until intracellular protein turnover cleared them from the cell, making the 1-h 
time point a reasonable representation of initial diversity present in the 
population. These data show the relative fitness of amino acid substitutions in a 
given protein variant; relative fitness across multiple protein variants cannot be 
accurately assessed from these data. 
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Extended Data Figure 6 | Natural metabolites can circumvent 
auxotrophies. a-d, Synthetic auxotrophs of pgk can be complemented by 
pyruvate or succinate. Strains were cultured in LB" in the presence of pyruvate, 
succinate, glucose or bipA (10 1M) and monitored by kinetic growth. The 
single-enzyme synthetic auxotroph pgk.d4 (a) grows similarly to prototrophic 
C321.AA (b) in the presence of pyruvate and succinate, but not glucose. 
Synthetic auxotrophs of adk (c) and tyrS (d) grow robustly in bipA but cannot 
be complemented by pyruvate or succinate. Growth of pgk.d4 and adk.d6é 

in glucose after 1,000 min is due to mutational escape (loss of bipA 
dependence). e, The synthetic auxotroph parental strain (C321.AA), a second 
prototrophic MG1655-derived strain (ECNR1), and three natural auxotroph 
derivatives of ECNR1 were grown in LB" supplemented with 166.66 ml1~' 


SH PHP GP SS SS SES 
PP PP ES AW > ofl gh 


Time (Hours) 


bacterial lysate (Teknova). Growth curves are shown with doubling 

times + one standard deviation of three technical replicates next to the labels. 
The conditions fully complement the metabolic auxotrophy of ECNR1.A4thyA, 
which doubles as robustly as prototrophic EcNRI. Strains lacking the asd 
gene (EcNR1.Aasd and the EcNR1.AdasdAthyA double knockout) show more 
impairment but enter exponential growth with doubling times of 91 to 137 min, 
respectively. f, g, Single- (f) and double-enzyme (g) synthetic auxotrophies 
are not complemented by natural products in rich media or bacterial lysate. 
h, When the Aasd auxotrophy is combined with double-enzyme synthetic 
auxotrophies the natural products are no longer sufficient to support growth. 
No growth is indicated by an asterisk in f-h. 
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Extended Data Figure 7 | Analysis of the A70V mutation as an escape 
mechanism for tyrS.d8. a, The X-ray structure of tyrS.d7 is shown; tyrS.d8 
varies by the single mutation V307A. BipA303, A70 and their neighbouring 
side chains are shown in stick representation, with bipA303 and A70 coloured 
orange. The bound tyrosine substrate is shown in spacefill. The A70V mutation 
(white sticks) may stabilize the catalytic domain when bipA is replaced by 
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natural amino acids by tightly packing with neighbouring side chains including 
V108. b, Escape frequencies on non-permissive media for three separately 
constructed tyrS.d8 A70V strains are shown for days 1 through 4. Although 
escapees are growth-impaired in the absence of bipA (Supplementary 

Table 10), all cells form colonies after 5 days, suggesting that A70V confers 
100% survival on non-permissive media. Positive error bars indicate s.e.m. 
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Extended Data Figure 8 | Conjugal escape frequencies of synthetic benefit of having multiple auxotrophies distributed throughout the genome. 
auxotrophs. Single-, double- and triple-enzyme auxotrophs were assayed to _— Notably, scaling from a single synthetic auxotrophy to three distributed 
determine the frequency of escape by HGT and recombination from a auxotrophies results in a reduction of conjugal escape by at least two orders of 


prototrophic donor as described in the Methods. These results highlight the magnitude. Positive error bars indicate standard deviation. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, b, g (°) 
Resolution (A) 
Rsym or Ruerge 
Isl 
Completeness (%) 
Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Ruworks Riree 
No. atoms 
Protein 
Ligand/ion 
Water 
B-factors 
Protein 
Ligand/ion 
Water 
R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


The data were collected using a single crystal. 
* Highest-resolution shell is shown in parentheses. 


tyrS.d7 
P1241 


81.3, 67.2, 90.7 

90.0, 102.6, 90.0 

50.0 - 2.65 (2.74-2.65) * 
0.074 (0.497) 

29.2 (4.65) 

99.0 (98.4) 

7.6 (7.7) 


45 - 2.65 
26407 
0.222/0.306 
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Extended Data Table 2 | Cost per litre of culture for commonly used NSAAs 


NSAA Vendor Name at vendor CAS# MW Cat# for1g Price of 1g Seal ee baat 
pAcF  peptech L-4-Acetylphenylalanine 122555-04-8 207.23 AL624-1 $500.00 1.0 $103.62 

pAzF Bachem H-4-Azido-Phe-OH 33173-53-4 206.2 F-3075.0001 $285.00 5.0 $293.84 
pCNF  peptech L-4-Cyanophenylalanine 167479-78-9 190.2 AL240-1 $150.00 1.0 $28.53 

bpa peptech L-4-Benzoylphenylalanine 104504-45-2 269.3 AL660-1 $100.00 1.0 $26.93 

napA — peptech L-2-Naphthylalanine 58438-03-2 215.25 AL121-1 $80.00 1.0 $17.22 

bipA —_ peptech L-4,4'-Biphenylalanine 155760-02-4 241.29 AL506-1 $150.00 0.1 $3.62 

pIF peptech L-4-lodophenylalanine 24250-85-9 291.09 AL261-1 $40.00 1.0 $11.64 

bipyA Asis Chem Bipyridylalanine custom synthesis 245.282 (25gprice) $10,000/25g 1.0 $98.11 
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Mechanistic insights into the recycling 
machine of the SNARE complex 


Minglei Zhao'*, Shenping Wu**, Qiangjun Zhou!, Sandro Vivona', Daniel J. Cipriano’, Yifan Cheng” & Axel T. Brunger!? 


Evolutionarily conserved SNARE (soluble N-ethylmaleimide sensitive factor attachment protein receptors) proteins 
form a complex that drives membrane fusion in eukaryotes. The ATPase NSF (N-ethylmaleimide sensitive factor), 
together with SNAPs (soluble NSF attachment protein), disassembles the SNARE complex into its protein components, 
making individual SNAREs available for subsequent rounds of fusion. Here we report structures of ATP- and ADP-bound 
NSF, and the NSF/SNAP/SNARE (20S) supercomplex determined by single-particle electron cryomicroscopy at near- 
atomic to sub-nanometre resolution without imposing symmetry. Large, potentially force-generating, conformational 
differences exist between ATP- and ADP-bound NSF. The 20S supercomplex exhibits broken symmetry, transitioning 
from six-fold symmetry of the NSF ATPase domains to pseudo four-fold symmetry of the SNARE complex. SNAPs interact 
with the SNARE complex with an opposite structural twist, suggesting an unwinding mechanism. The interfaces between 
NSF, SNAPs, and SNAREs exhibit characteristic electrostatic patterns, suggesting how one NSF/SNAP species can act on 


many different SNARE complexes. 


Membrane fusion is essential for many physiological processes in eukar- 
yotic cells, including protein and membrane trafficking, hormone secre- 
tion, and neurotransmission’”. The evolutionarily conserved SNARE 
proteins have a key role in these processes. Specific combinations of 
SNARE proteins are located on opposite membranes. Upon zippering 
into a highly stable four-helix bundle—the SNARE complex—they pro- 
vide the energy for membrane fusion**. The combinations of SNARE 
proteins depend on the source of vesicles and the identity of target mem- 
branes, but other factors also contribute to the specificity of the mem- 
brane targeting*. To maintain the pool of individual SNARE proteins, 
the ATPase NSF, together with SNAPs, disassembles post-fusion and 
non-productive SNARE complexes into individual protein compo- 
nents using the energy from ATP hydrolysis®. 

NSF was the first protein found to play a key role in eukaryotic 
trafficking”®. It isa member of AAA+ (ATPases associated with diverse 
cellular activities) superfamily of ATPases”, and it forms a homomeric 
hexamer with a molecular weight of ~500 kDa, with each protomer 
consisting of an amino-terminal domain (termed N) and two ATPase 
domains (termed D1 and D2) (Fig. 1a). The D1 domains are respons- 
ible for most of the ATPase activity of NSF, whereas the D2 domains 
are primarily responsible for hexamerization’’. The N domains are 
involved in SNAP and possibly SNARE binding”’. Prior to ATP hydro- 
lysis, the NSF, SNAP, and SNARE complex form the so-called 20S 
supercomplex’. 

Individual components of 20S supercomplex have been structurally 
characterized, including the crystal structures of several SNARE com- 
plexes*'?"!®, SNAPs’”"8, and the D2 and N domains of NSF’? ’. Structural 
studies of full-length NSF and the 20S supercomplex have also been 
carried out using quick-freeze/deep-etch, negative-staining electron micro- 
scopy; and electron cryomicroscopy (cryo-EM)****. However, due to 
the low resolution limits of these studies, the detailed molecular archi- 
tecture of the 20S supercomplex is unknown and critical questions remain 
to be answered such as: how the adaptor protein SNAP recognizes SNARE 
complexes, how many SNAPs are involved; how one NSF/SNAP species 


disassembles many different SNARE complexes in a promiscuous fash- 
ion; as well as the question of what is the molecular mechanism of 
disassembly. 

Here we present the structures of full-length NSF in two different 
nucleotide states (ATP- and ADP-bound, at 4.2 Aand7.6A resolution, 
respectively), and structures of two different 20S supercomplexes invol- 
ving different SNARE substrates determined by single particle cryo-EM, 
ranging from 7.6 to 8.4 A resolution. The cryo-EM structures reveal large 
conformational differences of NSF between ATP- and ADP-bound 
states, and upon binding to SNAPs and SNAREs. We confirmed by 
site-directed mutagenesis that the molecular interfaces between SNAPs, 
SNAREs, and NSF play important roles in disassembly, and propose 
that recognition at these interfaces is based on characteristic electro- 
static patterns. Based on these new insights we speculate about the mole- 
cular mechanisms of NSF-mediated SNARE complex disassembly. 


ATP- and ADP-bound NSF structures 


We developed a new purification scheme to address the heterogeneity 
of NSF samples caused by mixtures of nucleotide states. In essence, 
hexameric NSF was monomerized by completely removing the bound 
nucleotides through size-exclusion chromatography (SEC) (Methods 
and Extended Data Fig. la, b). The resulting NSF protomers could be 
reassembled into hexamers in the presence of the desired nucleotide. 
The reassembled ATP- and ADP-bound NSF hexamers were studied 
by single particle cryo-EM; EDTA was included to prevent hydrolysis. 
Our reassembled NSF hexamers are functionally active (Extended 
Data Fig. 1g and Methods). 

The reconstruction of ATP-bound NSF is shown in Fig. 1b, c and Ex- 
tended Data Fig. 2. The reconstruction has an estimated overall reso- 
lution” of 4.2 A after masking out flexible N domains (Extended Data 
Fig. 2e and Extended Data Table 1). All D2 domains, and five out of 
six D1 domains were well resolved in the final three-dimensional (3D) 
density map (Fig. 1b and Extended Data Figs 2d and 3a-c). Consistent 
with the estimated resolution, we observed grooves in o-helices, B-strands 
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Figure 1 | 3D density maps of ATP- and ADP-bound NSF. a, Domain 
diagram of the NSF protomer. b, Different views of the sharpened map (6.50) 
of ATP-bound NSF filtered to a resolution of 4.2 A with each domain colour- 
coded to match the domain diagram in panel a. A single chain of NSF 
(protomer) is coloured in gold to help with visualization. The density of one D1 
domain (subsequently referred to as the D1 domain in chain F) is not well 
resolved (see black arrow). ¢, Different views of the unsharpened map (1.80) 
of ATP-bound NSF showing the positions of the N domains. d, Different 
views of the sharpened map (6.80) of ADP-bound NSF filtered to a resolution 
of 7.6 A with each domain colour-coded to match the domain diagram in panel 
a. A single chain of NSF (protomer) is coloured in gold to help with 
visualization. e, Different views of unsharpened map (1.30) of ADP-bound 
NSF showing the positions of the N domains. The gap in the D1 ring is 
indicated by a black arrow. 


within B-sheets, and backbone zigzags corresponding to ~3.8 A Cx 
distances, along with densities of some aromatic side chains (Extended 
Data Fig. 3a, b). The crystal structure of the ATP-bound NSF D2 hex- 
amer’® was readily docked into the corresponding cryo-EM density, fol- 
lowed by refinement. The densities of the D1 domains were of sufficient 
quality to build and refine a de novo atomic model of the D1 domain 
with bound ATPs. The D1 domain is a typical AAA+ module with two 
subdomains (% and «/B) and motifs that are generally found in ATPases 
(Extended Data Fig. 3d, e). The «2 helix of the D1 domain is bent 
(Extended Data Fig. 3d), a distinctive feature compared to the straight 
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a2 helices found in the D2 domain of NSF, as well as in both D1 and 
D2 domains of the closest related relative, the AAA+ ATPase valosin- 
containing protein (VCP/p97)***”. 

The 3D reconstruction of ADP-bound NSF is shown in Fig. 1d, e. 
The overall estimated resolution”’ is 7.6 A (Extended Data Fig. 4 and 
Extended Data Table 1), with well-resolved tubular densities for o-helices 
(Fig. 1d). To obtain an atomic model of ADP-bound NSF, we docked 
our cryo-EM structure of the D1 domain protomer (obtained from ATP- 
bound NSF) and the crystal structure of the D2 domain hexamer” into 
the corresponding cryo-EM densities, followed by refinement. To com- 
plete the models, the crystal structure of NSF N domain’ was docked 
into the corresponding densities (Methods). 

The cryo-EM data sets of the NSF particles used in this study were 
of sufficient quality to determine and refine 3D reconstructions to high 
resolution without imposing any symmetry, which turned out to be crit- 
ical for resolving the NSF N domains and asymmetries in the ATPase 
rings (Extended Data Fig. 5, see Supplementary Information for a detailed 
discussion). 


Asymmetric features of ATP- and ADP-bound NSF 


Both the ATP- and ADP-bound structures of NSF are organized into 
three layers: two rings consisting of six D2 domains and six D1 domains, 
respectively, and a layer of six (four) N domains for ATP (ADP)-bound 
NSF (Figs Ic, e, and 2a, b). For ADP-bound NSF, the remaining two N 
domains are flipped along the sides of the ATPase rings with well resolved 
densities compared to the N domains atop the D1 ring, leaving little 
doubt as regards the identity of these two densities (Fig. le and Extended 
Data Figs 4c, e and 7c). 

For ATP-bound NSF, the D2 ring is planar and approximately six- 
fold symmetric. The D1 ring is reminiscent of a right-handed ‘split 
washer’, with each chain stepping up about 5 A as manifested by the 
relative positions of the «2 helix in the D1 domains (Fig. 2c and Ex- 
tended Data Fig. 6a). Chain F (purple) is an exception, which does not 
step up relative to chain E (blue), but instead slightly steps down towards 
chain A (red); there is a large step down from chain F to chain A. The 
D1 domain of chain F was not as well resolved in the density map as the 
other D1 domains (Fig. 1b), indicating its potential flexibility (Supplemen- 
tary Video 1). However, the density for this domain is clear enough to 
indicate that the « subdomain has a different position relative to the 
a/B subdomain, compared to the other five D1 domains that can all be 


a ATP-bound NSF b ADP-bound NSF 


Figure 2 | Structures of ATP- and ADP-bound NSF. a, Side-view of ATP- 
bound NSF. b, Side-view of ADP-bound NSF. The six protomer chains are 
rainbow coloured anticlockwise based on the relative positions of the D1 
domains to the D2 ring in the ATP-bound NSF model; the chain with the 
closest distance between D1 and D2 domains is named chain A (red). 
Nucleotides are shown as grey surfaces. See Methods for generation and 
refinement of the atomic models. c, A schematic diagram showing the topology 
of ATPase rings of ATP- and ADP-bound NSF, respectively. D1 rings are 
coloured according to the models shown in panels a and b. 
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well superposed (Extended Data Fig. 6c). Densities for ATP molecules 
are clearly visible in the nucleotide-binding pockets of the D1 domains 
of chains A through E, and in all D2 domains (Extended Data Fig. 3f, g); 
however, there is no clear density in the nucleotide-binding pocket of 
the D1 domain of chain F. 

For ADP-bound NSF, the D2 ring slightly deviates from a near perfect 
six-fold symmetric conformation, producing a small gap (Extended 
Data Fig. 7a). The D1 ring is more expanded and planar compared to 
ATP-bound NSF (Extended Data Figs 6b and 7b), with a large opening 
between chains A and F, which coincides with the small gap in the D2 
ring, that is, it forms an open “flat washer’ (Fig. 2c). The structures of all 
six D1 domains can be well superposed, but adopt different orienta- 
tions relative to the D2 domains (Extended Data Fig. 6b, d). 

When superposing the «/f subdomains of all the D1 domains of 
both ATP- and ADP-bound NSF (except for the flexible chain F in ATP- 
bound NSB), the «7 helix in the « subdomain is translated between the 
ATP and ADP-bound states (Extended Data Fig. 6e). The nucleotides 
are likely absent in the D1 ring of ADP-bound NSF as the conforma- 
tions would not favour binding of nucleotides because of possible clashes 
between the nucleotide and the translated 07 helix (Extended Data Fig. 6e). 
This conformational change of the « subdomain is correlated with the 
large difference of the D1 ring between the ATP- and ADP-bound states. 
The ATP loaded D1 ring is more compact, with a total interface area of 
5938 A” compared to that of 3746 A? in the ADP-bound state, resem- 
bling a spring-like transition from a ‘loaded’ split-washer state to a 
‘relaxed’ open-flat-washer state (Extended Data Fig. 7b and Supplemen- 
tary Video 2). This transition is further correlated with outward rota- 
tions of the D1 domains of chains A and B and the changes in their N 
domains (Extended Data Figs 6b and 7b, c). For a detailed comparison 
of our NSF structures with other members of the AAA + family, see the 
Supplementary Information. 


Structures of the 20S supercomplex 


We prepared 20S supercomplex consisting of AMPPNP-bound hex- 
americ NSF, «SNAP, and neuronal SNARE complex that is composed 
of syntaxin-1A, synaptobrevin-2/VAMP-2 (vesicle-associated mem- 
brane protein 2), and SNAP-25 (synaptosomal-associated protein 25) 
(Methods and Extended Data Fig. 1c-f). We used a truncated neur- 
onal SNARE complex (green shaded fragments in Fig. 3a), identical to 
the one of which the high-resolution structure had been determined”; 
it was chosen because it remains monomeric in solution even at high 
concentration, and is sufficient for reversible assembly and disassem- 
bly (Extended Data Fig. 1g). The 3D classification of the single particle 
cryo-EM data of the 20S supercomplex produced four different recon- 
structions that each represents an asymmetric molecular state of 20S, 
referred to as states I, II, III, and III, (see Methods). The corresponding 
refined density maps of the 20S supercomplex (without symmetry) have 
an overall resolution ranging from 7.6 to 8.4 A after gold-standard re- 
finement by RELION (Extended Data Fig. 8f)?”"’. 

The structures of the four states of the 20S supercomplex have the 
same overall architecture, so we discuss state I as a representative of all 
states in the following; detailed differences between the states are dis- 
cussed later. The 20S supercomplex resembles a tower with different 
domains organized into layers (Fig. 3b). At the base of the tower (in the 
orientation shown in Fig. 3b, middle panel) are the D2 and D1 ATPase 
rings of NSF, and at the top of the tower is a ‘spire’, made up of four 
oaSNAP molecules and one SNARE complex, surrounded by the six N 
domains of NSF. The 20S supercomplex is a striking example of broken 
symmetry: The approximate six-fold symmetry at the base of the com- 
plex is progressively violated in the D1 and N domains, allowing the 
complex to transit from six-fold symmetry to a pseudo four-fold sym- 
metry at the top (Fig. 3b and Extended Data Fig. 8e). 

The four «-helix bundle of the SNARE complex at the centre of the 
spire is clearly visible along with its characteristic twisted left-handed 
grooves, although the chemical identity of each polypeptide chain cannot 
be uniquely assigned at the available resolution (Fig. 3b and Extended 
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Figure 3 | 3D density map and structure of state I of the 20S supercomplex. 
a, Domain diagrams of the 20S supercomplex consisting of NSF, SNAP, 
synaptobrevin-2 (Syb-2), syntaxin-1A, and SNAP-25. The truncated neuronal 
SNARE complex consists of four SNARE domains (green) from synaptobrevin-2, 
syntaxin-1A and SNAP-25 (two SNARE domains). The boundaries of 
domains and lengths of proteins are indicated above the domain diagrams. 
Boundaries of the fragments of the truncated neuronal SNARE complex are 
indicated below the diagrams. b, Different views of the sharpened map (4.80) of 
state I of the 20S supercomplex filtered to a resolution of 7.6 A, with each 
domain colour coded to match the domain diagram in panel a. c, Three- 
dimensional model of state I of the 20S supercomplex. Each model is in the same 
orientation as in panel b. 


Data Fig. 31). The densities of the six N domains of NSF are not as well 
resolved compared to the other components of the supercomplex, but 
they are significantly better defined than those of ATP- and ADP-bound 
NSF alone. For about half of the six N domains, the density has a char- 
acteristic kidney shape as expected from the crystal structure of the 
N-domain (Extended Data Fig. 3j)”*”. To obtain an atomic model of 
the entire 20S supercomplex, we docked the crystal structures of the D2 
and N domains’*”’, our cryo-EM structure of the D1 domain (Fig. 2a), 
a homology model of aSNAP derived from the crystal structure of 
yeast homologue Sec17 (ref. 17), and the crystal structure of the trun- 
cated neuronal SNARE complex" into the cryo-EM density map. Real- 
space rigid body minimization was carried out, followed by reciprocal 
space refinement as described in Methods. The resulting models fit the 
density well (Fig. 3c, Extended Data Fig. 3h-l, and Supplementary 
Video 3). 


ATPase rings are tightened upon SNAP/SNARE binding 


When comparing the ATPase domains of the AMPPNP-bound 20S 
supercomplex to those of ATP-bound NSF, there are similar overall 
features, along with some important differences. The overall root-mean- 
square-deviation (r.m.s.d.) of the main chain atoms of D1 and D2 ATPase 
rings is 4 A, based ona superposition of the D1 ring. The D2 ring of 20S 
supercomplex is approximately six-fold symmetric, whereas the D1 
ring has a split-washer-like arrangement, similar to that of ATP-bound 
NSF (Extended Data Fig. 7d). Overall, the ATPase rings of the 20S super- 
complex adopt a tighter conformation than NSF alone (Extended Data 
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Fig. 7e). While there is a uniform increase in interface areas for the D2 
ring, the changes are more complex for the D1 ring (Extended Data 
Fig. 7f). These interface area changes are correlated with conforma- 
tional differences of the « subdomain of chain A and the «/f subdo- 
main of chain F (see Extended Data Fig. 3d for definition of these 
subdomains), as illustrated by the relative positions of «7 helices in 
the two structures (Extended Data Fig. 7f, inset). Overall, the ATPase 
rings of NSF are tightened upon binding to SNAPs and SNARE com- 
plex, resembling a spring being loaded (Supplementary Video 4). 


Four states of the 20S supercomplex 


While the D1/D2 ATPase rings of NSF are very similar among the four 
states, the aSNAP-SNARE spire and the N domains differ (Fig. 4a). 
The four states were grouped into three observed patterns (I, I, and III) 
based on the mode of interaction between «SNAP molecules and N 
domains (Fig. 4b). Considering that each xSNAP can interact with either 
one or two nearby N domains, one expects a total of nine theoretical 
patterns taking into account the split-washer asymmetry of the D1 ring 
(see Supplementary Discussion). However, only three patterns consis- 
tently emerged in the 3D classification. One explanation for this phe- 
nomenon is that the position of the SNAP-SNARE spire is not random, 
which might favour certain patterns over the others. Indeed, the centre 
of the spire—the SNARE complex—is always located close to chains E 
and F, which are at the raised edge of the D1 split washer (arrows in 
Fig. 4b), suggesting possible interactions between the pore loops (YVG 
motif)? of D1 domains and the SNARE complex. For pattern III, two 
subclasses were refined separately, resulting in states III, and III. The 
main difference between the two states involves the relative position of 
the spires, not the pattern of «SNAP and N domain interactions (Sup- 
plementary Video 5). Thus, the 20S supercomplex exhibits four major 
states, which are mainly characterized by the patterns of the N domains 
and the position of the &SNAP-SNARE spire, whereas the conforma- 
tions of the spire and base themselves do not differ much. 


Multi- modal interactions between N domains and oSNAP 


The structures of the four states of 20S reveal eight instances of “SNAP 
molecules that are interacting with two N domains of NSF, and another 
eight instances of SNAP molecules that are interacting with one N 
domain (Fig. 4). When superposing «SNAP molecules separately based 
on the 1:2 and 1:1 binding scenarios, two distinct N-domain binding 
sites on the surface of the C-terminal region of «SNAP appear in the 
case of 1:2 binding, whereas in the case of 1:1 binding, the N domains 
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Figure 4 | Top views of the four states of 20S supercomplex. The identified 
four states were aligned with respect to the D1 rings. a, Sharpened maps 
(state I: 4.70, state II: 4.50, state III,: 4.20, state II],: 4.00). b, Schematic 
drawings to help with visualization. The N domains and «SNAP molecules are 
shown as spheres and squares, respectively. The D1 rings are rainbow coloured 
using the same scheme as in Fig. 2, with black arrows indicating the split 
between chain A and chain F. The D2 rings are omitted for clarity. Each N 
domain is labelled with its corresponding chain identifier. The numbers of 
particles that contributed to the reconstruction of each state are listed. 


64 | NATURE | VOL 518 | 5 FEBRUARY 2015 


bind somewhere between these two sites (Fig. 5a). The electrostatic 
potential surface of the C-terminal region of «SNAP is quite negative, 
and both distinct binding sites are located in this negatively charged 
area (Fig. 5b). The N domains of NSF interact with either of the two 
binding sites on aSNAP via the same positively charged area (Fig. 5c). 
The two interfaces involve five positively charged residues of the N 
domain of NSF and eight negatively charged residues of the C-terminal 
region of aSNAP (Fig. 5d). 

Previous mutagenesis studies suggested that certain positively charged 
residues of the N domains are important for “SNAP and SNARE bind- 
ing’’, and that the C-terminal region of aSNAP is important for 20S 
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Figure 5 | Interactions between oSNAPs and N domains. a, Superposition of 
aSNAP molecules and N domains from the structures of the four states of 20S 
(Fig. 4). In eight instances one aSNAP interacts with two N domains (left). 
In the other eight instances one «SNAP interacts with one N domain (right). 
The superposition was performed with respect to the «SNAP molecules. The 
two distinct binding sites are highlighted by dotted boxes. b, Electrostatic 
potential surface of SNAP. The two distinct binding sites of the N domains 
are highlighted. c, Electrostatic potential surface of the N domain. Only the 
surface area interacting with aSNAP molecules is shown. d, Ribbon 
representation of aSNAP and N domain interactions. The SNARE complex is 
shown to help with visualization. Side chains of charged residues involved in the 
interactions are shown, with Asp and Glu coloured in red and Arg and Lys 
coloured in blue. e, Kinetic curves of the fluorescence dequenching-based 
SNARE complex disassembly assay using wild-type «SNAP and specified 
aSNAP mutants. f, Corresponding initial SNARE complex disassembly rates. 
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formation”, so our structures of the 20S supercomplex now provide 
the molecular explanation for these previous results. To further val- 
idate our observed interfaces with a functional assay, we performed 
mutagenesis of aSNAP based on our 20S structures, and used a fluor- 
escence dequenching-based assay to monitor the kinetics of SNARE 
complex disassembly**. Indeed, «SNAP mutants of either site I (quad- 
ruple mutant: D217A/E249K/E252K/E253K) or site II (C-terminal trun- 
cation: AC) showed impaired kinetics of SNARE complex disassembly 
as well as slower initial reaction rates (Fig. 5e, f). Simultaneous muta- 
tions of both sites resulted in completely inactive NSF/SNAP. 


Promiscuous interaction between aSNAPs and SNAREs 
In the structures of the 20S supercomplex, four »«SNAP molecules 
wrap around the SNARE complex in a four-fold rotational symmetric 
arrangement, despite the fact that the SNARE complex itself is only 
pseudo-symmetric, that is, the four o-helices consist of different pep- 
tide chains (Fig. 6a). Even more remarkably, the SNAP barrel has a 
right-handed twist in contrast to the left-handed twist of the four 
o-helix bundle of the SNARE complex. 
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Figure 6 | Interactions between aSNAP molecules and SNARE complex. 

a, A ribbon representation of the aSNAP-SNARE subcomplex for state I of 20S 
supercomplex. b, Electrostatic potential surface of the SNARE complex. 
Dotted lines indicate the location of the ionic layer**° at the centre of the 
SNARE complex. c, Electrostatic potential surface of SNAP. Only the surface 
area interacting with the SNARE complex is shown. The dotted lines indicate 
the location of the ionic layer of the SNARE complex. d, Cross-sections of 
the electrostatic potential surface of the xSNAP barrel. Three regions that 
may interact with the SNARE complex are highlighted and labelled. The 
locations of the cross-sections are illustrated in the insets. e, Kinetic curves of 
the fluorescence dequenching-based SNARE complex disassembly assay using 
wild-type SNAP and specified aSNAP mutants. f, Corresponding initial 
SNARE complex disassembly rates. 
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We infer that the C terminus of the SNARE complex is facing away 
from NSF since it would normally continue into the transmembrane 
a-helices of the SNAREs syntaxin-1A and synaptobrevin-2 (trans- 
membrane domains were not included in the constructs used). This 
inference is consistent with the orientations of «SNAPs, the hydro- 
phobic loops of which are pointing away from NSF for possible mem- 
brane association” (Fig. 6a). Moreover, our structures provide a molecular 
explanation for the previous finding that the trans-SNARE complex 
before membrane fusion is resistant to NSF disassembly*’: the «SNAP- 
SNARE subcomplex would not be able to form since the trans-SNARE 
complex is probably in a half-zippered state”*. 

The electrostatic potential surface of the SNARE complex has a highly 
conserved pattern with negative charges at the centre (Fig. 6b)’. The 
interacting surface of «SNAP has two extruding positively charged 
residues (K122, K163) close to the central ionic layer (zero layer)**° of 
the SNARE complex, and another two close to the C-terminal region of 
aSNAP (K203, R239) (Fig. 6c, d). Previous mutagenesis studies sug- 
gested the importance of K122, K163, and K203 (ref. 41). To further 
test the functional importance of these possible interactions, we mutated 
these two groups of residues, as well as a conserved concave negative 
patch close to the N terminus of aSNAP (E39, E40, E43, D80), and 
performed the SNARE complex disassembly assay. Remarkably, the 
K122E/K163E double mutant was completely inactive and the K203E/ 
R239E double mutant showed impaired kinetics (Fig. 6e, f). The nega- 
tive-patch quadruple mutant E38A/E40A/E43A/D80A affected kin- 
etics to a lesser degree, with slightly decreased initial reaction rates; the 
presence of a membrane may make this interaction between SNAPs 
and SNAREs more important*®. 


Variable aSNAP stoichiometry 


To investigate how NSF disassembles a different SNARE complex, we 
prepared a SNARE complex consisting of VAMP-7 (vesicle-associated 
membrane protein 7), syntaxin-1A and SNAP-25 (referred to as the 
V7-SNARE complex); this complex includes the N-terminal Habc do- 
main of syntaxin-1A and the N-terminal Longin domain of VAMP-7 
(Fig. 7a). We assembled the 20S supercomplex with NSF and «SNAP 
(referred to as the V7-20S supercomplex, Extended Data Fig. 9a—c). This 
complex is functionally active since the V7-SNARE complex is disassem- 
bled upon ATP hydrolysis”. We determined a cryo-EM reconstruction 
to an estimated resolution of 8.0 A without imposing any symmetry (Ex- 
tended Data Fig. 9d-f). Only two aSNAP molecules bound to the rather 
inclined SNARE complex bundle (Fig. 7b, c). Note that the two N- 
terminal domains of the V7-SNARE complex and two of the six N 
domains of NSF were not visible. Although the spire is not as well 
resolved as in the 20S supercomplex (as indicated by the lack of sepa- 
ration of the four o-helices of the V7-SNARE complex), the cryo-EM 
reconstruction of V7-20S revealed that NSF can use fewer “SNAP mole- 
cules and readjust the N domains when binding to different SNARE 
complexes. The supercoil axis of the truncated neuronal SNARE com- 
plex in the 20S structures is approximately perpendicular to the plane 
of the ATPase rings, whereas in the V7-20S structure the V7-SNARE 
complex is angled at 76 degrees relative to the plane of ATPase rings 
(Fig. 7b, d). Despite these differences, the mode (for example, right- 
handed twist) by which «SNAP interacts with SNARE complex and the 
location of the SNARE complex atop the D1 ATPase ring in V7-20S are 
similar to those of 20S (compare Fig. 3c and Fig. 7c). 

To further confirm the stoichiometry of the a&SNAP-SNARE sub- 
complexes in solution, we conducted composition-gradient multi-angle 
light scattering (CG-MALS) experiments using the two different SNARE 
complexes mixed with «SNAP in a composition gradient (Extended 
Data Fig. 10a, b). The CG-MALS data analysis revealed that «SNAP 
binds to the truncated neuronal SNARE complex at a maximum ratio 
of 4:1, whereas it binds to V7-SNARE complex at a maximum ratio of 
2:1 (Extended Data Fig. 10c, d). In solution, multiple species of the aSNAP- 
SNARE subcomplex are in equilibrium, but the cryo-EM structures of 


5 FEBRUARY 2015 | VOL 518 | NATURE | 65 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a V7-SNARE complex 
1 220 


VAMP-7 (1-190) [CSRS 
1 28 
Syntaxin-1A (1-265) [F256 SS 


1 206 


SNAP-25 (1-206) | 


8 


Figure 7 | 3D density map and structure of V7-20S supercomplex. 

a, Domain diagram of the V7-SNARE complex. Transmembrane domains 
(grey) were not included in the complex. The two N-terminal domains of 
VAMP-7 and syntaxin-1A are highlighted by red boxes. b, Different views 

of the 3D density map (4.50) coloured similarly to Fig. 3b. The angle between 
the long axis of V7-SNARE complex and the plane of ATPase rings is 
approximately 76 degrees as shown in the inset. c, Corresponding views of the 
atomic model fit to the density map of V7-20S. Note that each of the two 
oaSNAP molecules interacts with two NSF N domains. d, An illustration of the 
top view similar to Fig. 4b. The location of the N termini of the V7-SNARE 
a-helix bundle is indicated by a green dotted circle. 
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Figure 8 | Model of NSF-mediated SNARE complex disassembly. The 
model consists of four stages (a-d). The model refers to the neuronal SNARE 
complex (consisting of synaptobrevin-2 (Syb2), syntaxin-1A (Stx1A), and 
SNAP-25) and «SNAPs, but the model is also applicable to other SNARE 
complexes, along with a different number of SNAP molecules as observed in 
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both 20S and V7-20S suggest that NSF catches the saturated complex 
in both cases (Extended Data Fig. 10e, f). 


NSF mediated SNARE complex disassembly 


Based on our cryo-EM structures, we propose a working model of 
NSF mediated SNARE complex disassembly (Fig. 8). Starting with the 
cis-SNARE complex (that is, with both transmembrane domains in 
the same membrane), the first step (Fig. 8a, b) is the binding of SNAP 
molecules. Our cryo-EM structures suggest that depending on the par- 
ticular component proteins of the SNARE complex, up to four SNAP 
molecules are present. A stoichiometry higher than 4:1 is unlikely, due 
to packing considerations. Dozens of SNARE proteins exist in an eukar- 
yote, but there are only a few SNAP and very few NSF species*’**?”. 
We propose that one NSF/SNAP species can disassemble all SNARE 
complexes (including yeast SNAREs'***) using shape and character- 
istic electrostatic pattern recognition of SNARE complexes by SNAPs, 
rather than specific ‘lock into key’ interactions (Fig. 6). 

The second step of our model (Fig. 8b, c) is the binding of NSF, that 
is, the formation of 20S supercomplex. Upon binding to the SNAP- 
SNARE subcomplex, which acts like a fastener, both NSF ATPase rings 
are tightened, akin to a loaded spring (Extended Data Fig. 7d-fand Sup- 
plementary Video 4). The N domains are immobilized due to the inter- 
actions with SNAP molecules; characteristic electrostatic patterns may 
also play a role in these interactions. The opposing twists of SNAP 
molecules and the SNARE four o-helix bundle in both the 20S and V7- 
20S supercomplexes (Fig. 6a), along with the existence of four distinct 
molecular states (Fig. 4), suggests that the 20S supercomplex exerts a 
torque to unwind or loosen the SNARE complex while switching between 
the four states. This step requires ATP hydrolysis to initiate the move- 
ment of the NSF N domains, and subsequent force transmission via 
SNAPs. A second possibility is that the four states represent independ- 
ent binding modes: each would apply a force on its own upon binding 
to the SNAP-SNARE subcomplex to unwind or loosen the SNARE 
complex. 

The final step of our model (Fig. 8c, d) is the hydrolysis of multiple 
ATP molecules. We observed large conformational differences of NSF 
between the ATP- and ADP-bound states (Fig. 1c,e and Extended Data 
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Fig. 7a—c), suggesting large motions of NSF upon ATP hydrolysis. The 
changes of the D1 ring involve a 20 A vertical movement of the pore 
loops (Extended Data Fig. 6a, b), which may apply a shear force to the 
SNARE complex; the opening of the D1 ring from a split washer to an 
open flat washer, along with a motion of the N domains towards the 
sides of the ATPase rings, could apply a pulling force. Taken together, 
these forces completely disassemble SNARE complex into individual 
SNARE proteins. Interestingly, exactly two N domains are flipped in 
ADP-bound NSF (chains A and B). These two chains are at the lower 
edge of the D1 split washer (Fig. 2c). During the transition towards the 
ADP-bound flat ring, the D1 domains of chains A and B rotate out- 
wards (Supplementary Video 4 and Extended Data Fig. 6b), so they 
might be in a position favoured for the motion of the N domains. Also 
in our 20S structure, two N domains bind to one SNAP, and together 
they may be able to exert a larger force compared to 1:1 binding. We 
speculate that state II is a precursor of ADP-bound NSF. Finally, the 
opening of the D1 ring in ADP-bound NSF may serve as an exit for 
individual SNARE proteins since a pore translocation mechanism is 
unlikely (see Supplementary Discussion). Our model predicts that the 
release of individual SNARE proteins and SNAPs would initiate nuc- 
leotide exchange and restart the cycle. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Protein expression and purification. Chinese hamster NSF with a tobacco etch 
virus (TEV) protease cleavable N-terminal His-tag was expressed from pPROEX-1 
vector in E. coli. BL21(DE3)-RIL cells (Agilent Technologies) at 25 °C overnight 
using autoinducing LB medium“. After collecting the cells by centrifugation, they 
were resuspended in lysis buffer (50 mM TrisCl, pH 8.0, 300 mM NaCl, 50 mM 
imidazole, and 0.5 mM TCEP), and were subjected to sonication and centrifugation. 
The cleared lysate was loaded onto a HisTrap column (GE Healthcare), and washed 
with lysis buffer. NSF was eluted using elution buffer (lysis buffer supplemented 
with 350 mM imidazole). The fresh elution was pooled, concentrated, and supple- 
mented with final concentrations of 1mM EDTA, 1mM ATP, and 10% glycerol 
immediately to prevent aggregation and precipitation. The concentrated protein 
was immediately loaded onto a Superdex 200 16/60 column (GE Healthcare) that 
was pre-equilibrated with SEC Buffer (50 mM TrisCl, pH 8.0, 150 mM NaCl, 1 mM 
EDTA, 1 mM ATP, 1 mM DTT, and 10% glycerol). Fractions containing hexame- 
ric NSF was separated from aggregated NSF eluted from the void volume. Con- 
centrated hexameric NSF was loaded onto a Superdex 200 16/60 column that was 
pre-equilibrated with monomerization buffer (50 mM sodium phosphate, pH 8.0, 
150mM NaCl, 0.5mM TCEP). Depending on the amount of proteins, this step 
needed to be repeated for 3-4 times until all the NSF proteins were eluted as 
monomers (Extended Data Fig. 1a, b). Monomerized NSF fractions were pooled 
and supplemented with TEV protease, and incubated overnight at 4 °C. Tag-cleaved 
NSF was run through a HisTrap column to remove the TEV protease and the 
cleaved tags. Note that our method differs from that previously reported, which 
used apyrase to monomerize NSF”. Monomerized NSF was frozen and stored at 
—80°C for future use. To reassemble the hexameric NSF in a specific nucleotide 
state, for example, ATP or ADP, monomerized NSF was dialysed at 4°C over- 
night in reassembly buffer (50 mM TrisCl, pH 8.0, 150 mM NaCl, 1mM EDTA, 
1mM nucleotide, 1 mM DTT, and 10% glycerol). The concentrated dialysate was 
loaded onto a Superdex 200 16/60 column that was pre-equilibrated with reas- 
sembly buffer for a final clearance. For sample vitrification, the final SEC buffer did 
not contain glycerol, and the protein samples were concentrated to ~15 mg ml. 
The functional activity of purified NSF was tested by a gel-based disassembly assay 
(Extended Data Fig. 1g) and by a fluorescence-dequenching based SNARE com- 
plex disassembly assay described below. Note that some of the classical disassembly 
assays in the field were performed with NSF and nucleotide initially in the absence 
of Mg**, and then the reaction was triggered by the addition of Mg”, for example 
reference 36, establishing that NSF is active when initially prepared in complex 
with nucleotide in the absence of Mg”". 

Rat oSNAP was expressed and purified as described before”. 

Truncated neuronal SNARE complex containing rat SNAP-25 (amino acid 
range 7-83), SNAP-25 (amino acid range 141-204), syntaxin-1A (amino acid range 
191-256), and His-tagged synaptobrevin-2 (amino acid range 28-89) was cloned 
in the Duet expression system (Novagen). The complex was expressed in E. coli. 
BL21(DE3) cells at 30 °C overnight using autoinducing LB medium™. After col- 
lecting the cells by centrifugation, they were resuspended in lysis buffer (50 mM 
sodium phosphate, pH 8.0, 300 mM NaCl, 50 mM imidazole, and 0.5 mM TCEP), 
and were subjected to sonication and centrifugation. The cleared lysate was loaded 
onto a HisTrap column (GE Healthcare), and washed with lysis buffer, urea buffer 
(50 mM sodium phosphate, pH 8.0, 300 mM NaCl, 50 mM imidazole, 7.5 M urea, 
and 0.5mM TCEP) and wash buffer (lysis buffer supplemented with additional 
10 mM imidazole) sequentially. The SNARE complex was eluted using elution buffer 
(lysis buffer supplemented with 350 mM imidazole). The fresh elution was pooled 
and dialysed in dialysis buffer (50 mM sodium phosphate, pH 8.0, 50 mM NaCl, 
3 Murea, and 0.5 mM TCEP) at 4 °C overnight for anion exchange chromatography. 
The anion exchange chromatography was performed in buffers containing 3M 
urea (buffer A: 50mM sodium phosphate, pH 8.0, 50mM NaCl, 3 M urea, and 
0.5mM TCEP; buffer B: 50mM sodium phosphate, pH 8.0, 500 mM NaCl, 3M 
urea, and 0.5 mM TCEP) using a linear gradient of NaCl. The peak fractions were 
pooled, concentrated, and loaded onto a Superdex 75 16/60 column (GE Healthcare) 
that was pre-equilibrated with SEC Buffer (50 mM TrisCl, pH 8.0, 150 mM NaCl, 
0.5 mM TCEP). The peak fractions were pooled and supplemented with TEV prote- 
ase, and incubated overnight at 4°C. The tag-cleaved complex was subjected to a 
second round of anion exchange chromatography (buffer A: 50 mM TrisCl, pH 8.0, 
50 mM NaCl, and 0.5 mM TCEP; buffer B: 50 mM TrisCl, pH 8.0, 500 mM NaCl, 
and 0.5 mM TCEP), and further purified by a final size-exclusion chromatography 
in SEC buffer. The purified truncated neuronal SNARE complex was tested for 
disassembly activity using a SDS-PAGE gel based assay as previously described 
(Extended Data Fig. 1g)*°*. 

V7-SNARE complex consisting of rat full-length SNAP-25 (amino acid range 
1-206), syntaxin-1A (amino acid range 1-265), and His-tagged VAMP-7 (amino 
acid range 1-190) was cloned and expressed similarly to the truncated neuronal 
SNARE complex. However, the first anion exchange chromatography step for the 


truncated neuronal SNARE complex was omitted since the V7-SNARE complex 
showed less tendency to precipitate during purification. The other purification 
steps were the same as for the truncated neuronal SNARE complex. 

To assemble the 20S supercomplex, hexameric NSF loaded with AMPPNP was 
mixed with oSNAP and truncated neuronal SNARE complex in a mole ratio of 
1:10:2, and incubated on ice for 30 min. The mixture was then concentrated and 
purified by size-exclusion chromatography using a Superdex 200 10/300 column 
(GE Healthcare) pre-equilibrated with 20S Buffer (50 mM TrisCl, pH 8.0, 150 mM 
NaCl, 1 mM AMPPNP, 1 mM EDTA, and 1 mM DTT) (Extended Data Fig. 1c-f). 
The resulting peak fractions containing the 20S supercomplex were pooled and 
concentrated to a final concentration of ~15mgml ' for cryo-EM studies. The 
assembly and purification protocol for V7-20S supercomplex were essentially the 
same, except that hexameric NSF loaded with ATP was used. 

Sample vitrification. Initial attempts to image the reassembled NSF and 20S super- 
complex by cryo-EM were hindered by preferential orientations. 2D class averages 
showed that most of the particles were in end-on views, which is insufficient for 
structure determination (Extended Data Fig. 8a, inset). When the samples were 
deposited to a thin layer of carbon on top of the holey carbon grid, many side views 
were observed under FEI Tecnai TF20 operated at 200 kV. However, the contrast 
of the particles was significantly weaker. To ensure a sufficient number of side-view 
particles suspended in vitreous ice, samples were incubated in a buffer containing 
a small amount of detergent before plunge freezing. Multiple detergents were 
screened, and particles in Nonidet P-40 or its substitutes displayed the most side 
views. The final protein solution contained 0.05% of Nonidet P-40. However, this 
buffer condition dramatically reduces the particle density in the hole (~100-fold). 
In order to achieve a reasonable particle distribution, we concentrated the protein 
sample to approximately 15 mg ml’. This concentration exceeds the usual require- 
ment for cryo-EM. However, even at such high concentration, we observed few 
particles in areas where ice thickness was considered to be thin and ideal for cryo- 
EM. Therefore, images were collected from holes with ice that was considered to be 
thick by general cryo-EM consensus. Quantifoil Cu R1.2/1.3 grids (Quantifoil Micro 
Tools GmbH, Germany) were washed in chloroform for one hour and air-dried 
overnight. Aliquots of 2.5 jl samples were loaded onto the grids. Because the buffer 
contained detergent, the protein solution spread relatively well over the grid sur- 
face without glow discharge. Grids were blotted for 3 to 4s and plunge frozen in 
liquid ethane cooled by liquid nitrogen using a FEI Vitrobot (FEI Company). The 
same vitrification conditions were applied to all samples (ATP- and ADP-bound 
NSF, 20S and V7-20S). 

Cryo-EM data collection. Grids were transferred to TF30 Polara equipped with a 
field emission source and operated at 300 kV. Images were recorded on a Gatan 
K2 Summit direct electron detector operated in super-resolution counting mode 
following the established dose fractionation data acquisition protocol’. Images 
were recorded at a nominal magnification of 31000X, corresponding to a cali- 
brated super resolution pixel size of 0.61 A on the specimen. The dose rate on the 
detector was set to be ~8.2 counts (corresponding to ~ 10.9 electrons) per phys- 
ical pixel per second. At this setting, the coincidence loss is about 11.5% and the 
total loss, including the loss due to imperfect detector quantum efficiency, is about 
25%’. The total exposure time was 6 s, leading to a total accumulated dose of 44 e— 
per A? on the specimen. Each image was fractionated into 30 frames, each with an 
accumulation time of 0.2 s. Dose-fractionated images were recorded using a semi- 
automated acquisition program UCSFImage4 (written by Xueming Li). Defocus 
values ranged from —1.8 to —2.8 um. 

Image processing. Super-resolution counting images were 2 X 2 binned by Fourier 
cropping for motion correction, resulting in a pixel size of 1.22 A. Motion corrected 
frames were summed to a single micrograph for subsequent processing**. Defocus 
values were determined for each micrograph using CTFFIND3“. A semi-automated 
procedure similar to a previous work was used to pick particles”’. Briefly, for each 
data set, ~2,000 particles were manually picked to calculate 2D class averages. Uni- 
que 2D class averages were selected as templates for automated particle picking. 
Picked particles were visually inspected. Obviously falsely picked particles were 
removed. 2D classification was performed using RELION and SPIDER”. Initial 3D 
models were generated using the common lines method*'. 3D classification and 
gold-standard refinement were performed using RELION”. The initial model from 
the common lines method was low-pass filtered to 60 A and used as the starting 
model for the initial auto-refinement, which generated a consensus model. This 
consensus model was again filtered to 60 A and used for 3D classification. We used 
prior knowledge of NSF to distinguish ‘good’ classes from ‘bad’ classes: 3D class 
averages that showed incorrect features of NSF were considered as bad, including 
the wrong numbers of apparent D1 and/or D2 domains or a seriously deteriorated 
D2 ring. All good class averages had the correct number (six) ofall the domains and 
well-defined D2 ring density. 

For 208, good 3D class averages showed the correct number of all the domains, 
and well-defined densities of the aSNAP-SNARE spire and D2 ATPase ring. Particles 
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in distinct good 3D classes were refined separately, yielding four final reconstruc- 
tions (Fig. 4a). The four final reconstructions belong to three patterns (I, II, and III) 
based on the mode of interactions between «SNAP and N domains. When we 
tested different settings of the 3D classification procedure”’, other patterns were 
either rarely populated or showed deteriorated features in some domains, prevent- 
ing refinement of the pattern to a reasonable resolution. Therefore, these other 
patterns may represent minor populations if they actually exist. Note that when we 
sub-classified pattern I or II, we also observed multiple states, but the differences 
between them were much smaller than those between states III, and III,,. Therefore 
we did not refine those subclasses individually. The observed conformational het- 
erogeneity may have prevented crystallization of the 20S supercomplex despite 
extensive efforts by many laboratories. 

No symmetry was assumed throughout the entire process for all NSF and 20S 
reconstructions (except for demonstrating the detrimental effect of including 
symmetry, Extended Data Fig. 5, see also Supplementary Discussion). The solvent 
area of raw particles was set at zero for 3D classification (zero_mask = true). The 
solvent area was filled with random noise for auto-refinement (zero_mask = false). 
The result from auto-refinement was slightly worse when zero_mask was set to true. 
For ATP-bound NSF sample, continued refinement using the same particles but 
summed from frame number 2 to number 18 (the first frame is number 1) improved 
the resolution from 4.4 A to 4.2 A. As recommended by the RELION documenta- 
tion, “relion_postprocess” was used to generate a soft mask, which was applied to 
the two half maps before FSC was calculated. The resolution reported in Extended 
Data Table 1 was estimated from the masking-effect-corrected FSC curve using the 
FSC = 0.143 criterion”. B-factor sharpening was carried out by “relion_postprocess” 
implementing an automated B-factor fitting algorithm®. The local resolution was 
estimated using ResMap on unsharpened and unfiltered maps™. Detailed informa- 
tion for each final reconstruction is summarized in Extended Data Table 1. 
Model building and refinement. The two half-maps of ATP-bound NSF were 
summed to a single unsharpened map. The summed map was then sharpened 
manually using a series of negative B-factors by XMIPP**. The sharpened maps 
were used for model building in COOT”. The crystal structure of the D2 domain 
of NSF (PDB accession code: 1NSF)'? in complex with ATP-Mg** was first docked 
as the hexamer observed in the crystal structure into the density map as a single 
rigid body. Minor adjustments of the backbone and side chains of the D2 domain 
were manually performed using COOT. The nucleotide conformation observed in 
the crystal structure matched the density of the cryo-EM map (Extended Data Fig. 3f), 
demonstrating that the absence of Mg”~ in the assembly of ATP-bound NSF did 
not affect the conformation of the nucleotide. A homology model of the D1 domain 
protomer generated by SWISS-MODEL” based on the crystal structure of N-D1 
domain of p97 (PDB accession code: 1E32)”* was initially placed into the best of the 
six copies of D1 densities. However, the detailed fit to the observed densities of the 
D1 domains was relatively poor and it required complete retracing the main chain 
and side chains in order to obtain a better fit (starting from the structure of the D2 
domain of p97 as the homology model did not produce a better initial fit to the 
density map and it would have still required complete retracing). The rebuilt D1 
domain was then docked into the other five copies of D1 density. Densities for ATP 
were visible in 11 out of the 12 possible nucleotide-binding pockets (no clear density 
for nucleotide was seen in the D1 domain of chain F), and models of ATP were fit to 
these densities. Rigid body minimization was carried out for the «/B and « sub- 
domains of each D1 domain separately using COOT, followed by manual adjust- 
ment, residue by residue. To complete the model, the crystal structure of NSF N 
domain (PDB accession code: 1QCS)*' was docked into the corresponding densi- 
ties of the unsharpened map without any fitting; the quality of these densities was 
good enough to determine the approximate positions of the N domains. The partial 
model containing the D1 and D2 domains was subjected to reciprocal space refine- 
ment. Amplitudes of the summed map were corrected by frequency-dependent 
scaling factors determined by comparing the experimental maps with a reference 
map calculated from the full model. A soft-edged mask was generated based on 
the built atomic model (including only the D1 and D2 domains) and applied to the 
scaled map. Most solvent regions and the density corresponding to N domains 
were masked out. The masked maps were put into an artificial unit cell with P1 
symmetry and converted to MTZ format using CCP4 program sftools”’. The result- 
ing reflection files were used to perform maximum likelihood refinement using 
PHENIX® with secondary structure restraints, reference model restraints, and auto- 
matic optimization of data and stereochemistry weights. The reference models 
were generated from the built models using the geometry minimization function in 
PHENIX. The refined D1 and D2 domains were combined with the docked N 
domains to produce a complete model. 

To model ADP-bound NSF, the crystal structure of the D2 domain and the 
refined cryo-EM structure of the D1 domain (from ATP-bound NSF) were docked 
into the sharpened map. Rigid body minimization was carried out for the «/B and « 
subdomains of each ATPase domain (six D1 and six D2) separately using COOT. 
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Densities for ADP were clearly visible for four nucleotide-binding sites of the D2 
domains (except those in chains A and F). The resulting model was refined using 
PHENIX” as described above for the ATP-bound structure. The model was com- 
pleted by docking the N domains into the unsharpened map, without fitting and 
further refinement; the quality of these densities was good enough to determine the 
approximate positions of the N domains. 

For 20S supercomplex structure, the model of ATP-bound NSF containing the 
D1 and D2 domains was docked into each of the four sharpened maps corres- 
ponding to the four states of 20S (Fig. 4). Rigid body minimization was carried out 
for the o:/B and « subdomains of each ATPase domain (six D1 and six D2) sepa- 
rately using COOT™. The crystal structure of the N domain (PDB accession code: 
1QCS)”' was docked into the peripheral densities. The SNARE complex was easily 
identified and modelled using the crystal structure of truncated neuronal SNARE 
complex (PDB accession code: 1N7S)'*. A homology model of «SNAP generated 
by SWISS-MODEL” based on the crystal structure of yeast Secl7 (PDB accession 
code: 1QQE)”’ was docked into the remaining densities surrounding the SNARE 
complex. The resulting models were refined in reciprocal space using PHENIX” as 
described above. 

The fitting of the V7-20S supercomplex was performed similarly, except that 
only two aSNAPs and four out of six N domains were modelled based on the 
observed densities in the cryo-EM map. The crystal structure of the closely related 
truncated neuronal SNARE complex’? was used for model building since there is 
no crystal structure of the V7-SNARE complex available. The Habc domain of 
syntaxin-1A and the Longin domain of VAMP-7 were not visible. The model of 
V7-20S supercomplex was not further refined in reciprocal space. 

For cross-validation of overfitting, we followed the procedures published before. 
Briefly, the coordinates of refined ATP-bound NSF model (only D1 and D2 domains) 
were displaced randomly by 0.1 A using PHENIX (PDB Tools) to remove potential 
model bias. The displaced model was then refined against one of the half maps in 
reciprocal space. FSC curves were calculated between the resulting model and half 
map 1 (‘work’, that is, used for refinement), the resulting model and half map 2 
(‘free’, that is, not used for refinement), and the resulting model and the summed 
map. The lack of separation between work and free FSC curves suggested that the 
model was not overfitted. 

MolProbity®™ was used for evaluating the geometry of the models. The model 
statistics of the refined models are summarized in Extended Data Table 2 (note 
that R values are arbitrary since they depend on the exact definition of the P1 cell, 
so they are not provided in the table). Molecular graphics and analyses were per- 
formed with either PYMOL (The PyMOL Molecular Graphics System, Version 
1.7.0.5 Schrédinger, LLC.) or UCSF Chimera® (Chimera is developed by the Resource 
for Biocomputing, Visualization, and Informatics at the University of California, 
San Francisco, supported by NIGMS P41-GM103311). All density maps presented 
were B-factor sharpened and filtered by RELION using the B-factors listed in Ex- 
tended Data Table 1, unless otherwise mentioned. 

Flourescence dequenching-based SNARE complex disassembly assay. The 
details of the assay were published previously”. Briefly, soluble rat neuronal SNARE 
complex containing syntaxin-1A (amino acid range 1-265, S249C, K253C), full- 
length SNAP-25 (amino acid range 1-206), and synaptobrevin-2 (amino acid range 
1-96), was labelled with Oregon Green (forming covalent linkages with two cysteine 
residues close to the C terminus of syntaxin-1A, and four native cysteine residues of 
SNAP-25). The disassembly reactions were carried out using FlexStation II (Molecular 
Devices) in a 384-well plate with a reaction volume of 60 pl. Each condition (dif- 
ferent «SNAP mutants) was divided into four replicas, and the average was plotted. 
Final concentrations of 400 nM Oregon Green-labelled SNARE complex, 2 1M 
oSNAP, and 85 nM NSF were included in the reaction buffered with 50 mM TrisCl, 
pH 8.0, 20 mM NaCl, 2mM ATP, 2mM MgCh, and 0.5 mM TCEP. To measure 
the initial SNARE complex disassembly rate, the first 20 data points after adding 
the NSF were used for linear regression analysis, except for aSNAP mutants 
D217A/E249K/E252K/E253K/AC, and K122E/K163E, for which the first 1,000 
data points were used because the slope was close to zero. Note that the disassembly 
rate of wild type proteins (Fig. 5f) is comparable to that of previous experiments**””, 
illustrating that our purification method of NSF produces fully active proteins. 

Composition-gradient multi-angle light scattering (CG-MALS). The experi- 
ments were performed in Tris buffered saline (TBS; 17 mM Tris, 50 mM NaCl, 
pH 8.0) with or without 0.1 mM TCEP. After dilution to the appropriate stock concen- 
tration, proteins were filtered using a syringe-top filter (0.02 um, Anotop, Whatman). 
CG-MALS experiments were performed with a Calypso II composition gradient 
system (Wyatt Technology Corporation) to prepare different compositions of protein 
and buffer, and deliver them to an online UV/Vis detector (Waters Corporation) 
and DAWN HELEOS II multi-angle light scattering detector (Wyatt). Inline filter 
membranes with 0.03 tm pore size were installed in the Calypso system for addi- 
tional sample and buffer filtration. An automated CG-MALS method was performed, 
consisting of single component concentration gradients for each species to quantify 
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any self-association and a dual-component ‘crossover’ gradient to assess the inter- 
action between aSNAP and each SNARE complex (Extended Data Fig. 10a). For 
each composition, the Calypso system prepared an aliquot of protein solution, 
injected it into the detectors and stopped the flow for 60-240 s to allow the reaction 
to come to equilibrium within the MALS detector flow cell. Equilibrium light scat- 
tering and concentration data were fit to an appropriate association model using 
the CALYPSO software (Wyatt). 
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Extended Data Figure 1 | Purification of recombinant NSF bound to 
specific nucleotide and 20S supercomplex. a, Schematic diagram of the 
purification steps of NSF. Chromatography columns and buffer conditions are 
provided. b, Size-exclusion chromatograms corresponding to the coloured 
steps in panel a. Major peaks are labelled. c, A scheme showing the purification 
steps of the 20S supercomplex. d, Size-exclusion chromatogram of the 20S 
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supercomplex. Major peaks are labelled. e, SDS-PAGE gel of fractions collected 
in panel d. The samples were not boiled. f, SDS-PAGE gel of the same fractions 
as in panel e. The samples were boiled. g, SDS-PAGE-based SNARE 
disassembly assay of the truncated neuronal SNARE complex. This complex is 
stable in SDS without boiling. Disassembly by NSF/aSNAP is observed as a 
function of time. 
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Extended Data Figure 2 | 3D reconstruction of ATP-bound NSF by single- 
particle cryo-EM. a, A representative electron micrograph (out of 1,150 
micrographs) of ATP-bound NSF particles in vitreous ice. b, Selected 2D class 
averages (6 out of 50). ¢, Plots of the angular distribution of particle projections. 
The radius of the sphere at each projection direction is proportional to the 
number of particle images assigned to it. Two alternative views are shown, with 
either the z axis or the y axis pointing out towards the viewer. Two 
corresponding re-projection images of the final density map are shown under 
the plots. d, Selected slice views of the final reconstruction. The slice numbers 
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are indicated. e, FSC curves for the 3D density map after RELION post- 
processing. The resolution is estimated to be 4.2 A by the gold-standard 
refinement criterion, as indicated by the red arrow. The FSC curve between the 
refined atomic model and the 3D density map is shown in blue. f, FSC curves for 
cross-validation. Black, model versus summed map (full data set); green, model 
versus half map 1 (used for test refinement); orange, model versus half map 2 
(not used for test refinement). See Methods for details. g, 3D density map 
coloured according to the local resolution as estimated by ResMap. 
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Extended Data Figure 3 | Representative densities from the cryo-EM 
reconstructions of ATP-bound NSF and the 20S supercomplex. a-g, ATP- 
bound NSF. Density maps were sharpened by XMIPP using a B-factor of 
—123 A?. a, Representative densities (black mesh, 7.80) for an o-helix anda 
B-strand of the D1 domain with the refined model (coloured sticks) 
superposed. b, Representative density (black mesh, 7.00) of the B-sheet of the 
a/B subdomain of D1 (chain C) with the refined model (yellow Cx ribbon) 
superposed. c, Density (black mesh, 7.00) and model (yellow Ca ribbon) for the 
D1 and D2 domains of chain B. Note that the linker between the two domains is 
well resolved. d, De novo model (yellow cartoon) of the D1 domain built from 
the cryo-EM density map (black mesh, 7.00). The arrangement of the 
subdomains and nucleotide is illustrated in the inset. The pore loop (YVG 
motif) and two a-helices: «2 from the o/f subdomain, and «7 from the o 
subdomain are highlighted in the red dotted boxes. e, Density (black mesh, 
6.50) and model (yellow Co ribbon) of the ATP binding pocket of the D1 
domain (chain C). Motifs that are typical for AAA+ ATPases are indicated. 


f, Superposition of the crystal structure of the ATP-bound D2 domain with 
Mg?" (ref. 19, PDB accession code: 1NSF, coloured sticks and balls), and the 
cryo-EM density map (black mesh, 7.60) of ATP-bound NSF (density of chain 
C). The crystal structure was docked into the density as a rigid body without any 
refinement. Note that no Mg’* was present in the samples for cryo-EM studies, 
but the ATP molecule and the protein coordinates from the crystal structure 
match the cryo-EM density well. g, Nucleotide-binding sites of the D1 domains 
from ATP-bound NSF. The density (translucent surface, chains A-E: 8.20, 
chain F: 5.0) of each D1 domain is shown together with the built model in 
ribbon representation. The nucleotide-binding pockets are highlighted by 
dotted circles. Five out of the six D1 domains show clear density for ATP. 
h-l, Representative densities (translucent surface, 4.70) from the 
reconstruction for state I of the 20S supercomplex with the model (cartoon) 
superposed. All densities are representative except for the N domain in panel 
j, which represents the better-resolved half of the N domain densities (12 out of 
24 cases). 
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Extended Data Figure 4 | 3D reconstruction of ADP-bound NSF by single- 
particle cryo-EM. a, A representative electron micrograph (out of 840 
micrographs) of ADP-bound NSF particles in vitreous ice. b, Selected 2D class 
averages (6 out of 30). ¢, Selected focused 2D class averages (5 out of 10). 
The first image shows the focused classification mask, which locates the flipped 
N domains. d, Plots of angular distribution of particle projections. The radius of 
the sphere at each projection direction is proportional to the number of particle 


D1 Ring 


images assigned to it. Two alternative views are shown, with either the z axis or 
the y axis pointing out towards the viewer. Two corresponding re-projection 
images of the final density map are shown under the plots. e, Selected slice 
views of the final reconstruction. Slice numbers are indicated. f, FSC curve for 
the 3D density map after RELION post-processing. The resolution is estimated 
to be 7.6 A by the gold-standard refinement criterion. g, 3D density map 
coloured according to the local resolution as estimated by ResMap. 
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Extended Data Figure 5 | Detrimental effect of imposing C6 symmetry on 
the reconstruction of ADP-bound NSF and C3 symmetry on the 
reconstruction of the 20S supercomplex. a, For the NSF maps, in order to 
visualize densities of the N domains, an unsharpened map is displayed 
(translucent surface, C1: 1.20, C6: 0.60) together with the sharpened map 
using no symmetry (C1) or C6 symmetry during reconstruction (coloured 
surface, C1: 5.90, C6: 7.00). For the reconstruction that uses C6 symmetry, 
symmetric densities for the N domains at top and side positions appear in the 
unsharpened map, however, these densities cannot be matched to the crystal 
structure of the N domain. Likewise, the D1 domains appear compressed and 
cannot be fit well using the structure of the D1 domain that we obtained by 


b 20S 
C1 c3 


top view 


asymmetric reconstruction. b, Reconstruction of state I of the 20S 
supercomplex without symmetry (C1) or with C3 symmetry. Density maps are 
shown in coloured surfaces similar to Fig. 3 (C1: 4.70, C3: 4.90). The C3 
averaging causes the D1 domains to display alternating up and down positions. 
The density for the SNARE complex is a featureless rod without the 
characteristic left-handed twist of the four 4-helix bundle. Densities for only 
three SNAPs emerge, but without any interpretable features (for example, 
there are no grooves between helices), preventing a match with the crystal 
structure of the known homologue of «SNAP, Sec17. The N domain densities 
are weak and none of them exhibit the expected kidney shape. 
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Extended Data Figure 6 | Comparison of AAA+ ATPase domains from 
ATP- and ADP-bound NSF structures. a, Unrolling of the ATPase domains 
of ATP-bound NSF. Two orthogonal views are shown. Individual chains are 
aligned based on the D2 domains (white) to show the split-washer arrangement 
of the D1 domains. b, Unrolling of the ATPase domains of ADP-bound NSF. 
Individual chains are aligned as in panel a. Dotted boxes in panels a and 

b highlight the «2 helices of the D1 domains in order to help with visualization 
of the relative positions. The six protomer chains are rainbow coloured as in 
Fig. 2. c, Superposition of the six D1 domains of ATP-bound NSF based on the 
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Chain D Chain E 


Chain D Chain E 


D1 domains (ADP) 


a7 helices 


a/B subdomain 


(50 NSF-ATP 
NSF-ADP 


a/B subdomains (white). The relative positions of «7 helices from the « 
subdomains are illustrated in the inset. d, Corresponding superposition of the 
ADP-bound NSF D1 domains. e, Superposition of the five D1 domains 
(without chain F) of ATP-bound NSF (grey), and six D1 domains of 
ADP-bound NSF (white) based on the «/f subdomains. The «7 helices from 
the « subdomains are highlighted by red dotted boxes. The relative translation 
of the «7 helices between the ATP-bound state and the ADP-bound state is 
shown in the inset. 
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Extended Data Figure 7 | Comparison of ATP- and ADP-bound NSF 
structures, and ATPase domains of ATP-bound NSF and 20S 
supercomplex. a-—c, Surface representations of the D2, D1 and N domains of 
ATP- and ADP-bound NSF (looking down from the N-terminal side of the 
NSF hexamer). The maximum diameters of the D2 and D1 rings, and the 
interface areas (calculated by PISA®*) between ATPase domains are indicated. 
Each protomer chain is coloured as in Fig. 2. The D1 ring is also shown in panel 
cand coloured white to help with visualization. d-f, The ATPase domains 

of the structure of the 20S supercomplex (state I) were superposed on the 
ATP-bound NSF using the D1 ring as the reference for the fit. Six protomer 
chains from ATP-bound NSF are rainbow coloured as in Fig. 2. The ATPase 
domains of the 20S supercomplex are colored in white and grey. Note that the 
density of chain F in the reconstruction of ATP-bound NSF alone is poorly 


ARTICLE 


resolved (Fig. 1b), whereas in the 20S reconstruction it is well defined, although 
the overall resolution of the 20S reconstruction is lower. d, Side views. e, Top 
view of the D2 rings. Each individual D2 domain is labelled. Percentages of 
interface area change (from NSF to 20S) between the D2 domains are provided 
in the figure. The interface areas between the D2 domains are similar in the 
NSF and 20S structures, except for a significant increase (12%) between 
chains D and E for 20S compared to NSF alone. f, Top view of the D1 rings. 
Each D1 domain is labelled, with the split between chains A and F indicated bya 
black arrow. The translation of the «7 helix in « subdomain of chain A is 
illustrated in the inset. Percentages of interface area change (from NSF to 20S) 
between the D1 domains are shown. Three interfaces stay the same; the one 
between chains A and B decreases, whereas those between chains E and F, and 
chains F and A increase significantly. 
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Extended Data Figure 8 | 3D reconstruction of 20S supercomplex by single- 
particle cryo-EM. a, A representative electron micrograph (out of 610 
micrographs) of the 20S supercomplex in its original purification buffer 
recorded using the TF20 microscope and the TVIPS TemCam-F816 CMOS 
camera. The inset shows selected 2D class averages (5 out of 50). b, A 
representative electron micrograph (out of 2,459 micrographs) of 20S 
supercomplex in the buffer containing additional 0.05% Nonidet P-40 recorded 
using the TF30 Polara microscope and the Gatan K2 Summit detector. 

c-g, Results from this imaging condition. c, Selected 2D class averages (6 out 
of 50). d, Plots of angular distribution of particle projections. The radius of the 
sphere at each projection direction is proportional to the number of particle 


images assigned to it. Two alternative views are shown, with either the z axis 
or the y axis pointing out towards the viewer. Two corresponding re-projection 
images of the final density map are shown under the plots. e, Selected slice views 
of the final reconstruction. Slice numbers are indicated. Slices from different 
layers are framed in different colours: SNAREs and aSNAPs: yellow, N 
domains: pink, D1 ring: blue, and D2 ring: purple. f, FSC curves for the 3D 
density maps of the four states after RELION post-processing. The estimated 
resolution ranges from 7.6 A to 8.4 A as estimated by the gold-standard 
refinement criterion. g, 3D density map coloured using local resolution 
estimated by ResMap. The right panel shows a cut-through view of the interior 
of the map. c-e and g are results from a subclass representing state I. 
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Extended Data Figure 9 | Purification and 3D reconstruction of V7-20S The samples were boiled. d, A representative electron micrograph (out of 993 
supercomplex. a, Size-exclusion chromatogram of the V7-20S supercomplex. _ micrographs) of the V7-20S supercomplex. e, Selected 2D class averages (6 out 
Major peaks are labelled. Only fraction 10 was concentrated and used for single- of 50). f, FSC curve for the 3D density map after RELION post-processing. 
particle cryo-EM. b, SDS-PAGE gel of fractions collected in panel a. The The estimated resolution is 8.0 A as estimated by the gold-standard 

samples were not boiled. c, SDS-PAGE gel of the same fractions as in panel b. __ refinement criterion. 
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Extended Data Figure 10 | CG-MALS characterization of aSNAP-SNARE 
subcomplex. a, Concentration gradient setup for the experiment that measures 
the binding between aSNAP and truncated neuronal SNARE complex. 

b, Measured molar mass for different components. Note that there were two 
independent runs for «SNAP over the specified concentration ranges. 

c, Measured molecular mass of SNAP-SNARE (truncated) subcomplex 
converted from light scattering over the concentration gradient. The 
experimental data are represented by blue dots. Simulated curves with different 
oaSNAP:SNARE (truncated) stoichiometry are shown. The best fit is 4:1. 
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d, Measured molecular mass of the aSNAP-V7-SNARE subcomplex calculated 
from light scattering over the concentration gradient. The experimental data 
are represented by blue dots. Simulated curves with different aSNAP:V7- 
SNARE stoichiometry are shown. The best fit is 2:1. e, Calculated mole fractions 
of different «SNAP-SNARE (truncated) species over the concentration 
gradient based on 4:1 stoichiometry. f, Calculated mole fractions of different 
oaSNAP-V7-SNARE species over the concentration gradient based on 2:1 
stoichiometry. 
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Extended Data Table 1 | 3D reconstructions of NSF and 20S supercomplex by single particle cryo-EM 


NSF (ATP) NSF (ADP) 20S V7-20S 
Electron microscope TF30 Polara TF30 Polara TF30 Polara TF30 Polara 
Accelerating voltage (kV) 300 300 300 300 
Defocus range (um) -1.8 — -2.8 -1.8 —-2.8 -1.8 —-2.8 -1.8—-2.8 
Electron dose (e/A”) 44 (26.4°) 44 44 44 
Pixel size (A) 1.2156 2.4312 2.4312 2.4312 
Particles processed 89,289 30,848 116,082 65,126 

: : state I state IT state ITI, state III, 

Particles refined 50,781 12,830 29,717 21,489 15,249 14,991 32,100 
Resolution of unmasked map (A) 6.4 9.2 8.6 8.9 9.4 9.2 9.2 
Resolution of masked map (A) 4.2 7.6 7.6 7.8 8.4 8.2 8.0 
Map sharpening B-factor (A’)* -129 -479 -428 -601 -612 -734 -395 


*The accumulated dose of the first 18 frames. 
+ Resolution of the masked map is estimated from masking-effect-corrected FSC curves. 
¢ B-factors automatically determined by RELION. 
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Extended Data Table 2 | Statistics of model refinement 


NSF (ATP) | NSF (ADP) 20S supercomplex 
Model composition 
Total atoms 21,712 21,407 41,435 
Peptide chains 6 6 14 
Protein residues 2,906 2,887 5,431 
Refinement 
Unit cell (P1) 
a=b=c(A) 311.2 311.2 311.2 
a=B=y (°) 90 90 90 
‘: 8 state I state II state III, state III, 
Resolution (A) 4.2 7.6 16 78 84 82 
R.m.s. deviations 
Bond lengths (A) 0.008 0.011 0.009 0.009 0.009 0.009 
Bond angels (°) 1.762 1.976 1.596 1.567 1.585 1.570 
Ramachandran plot 
Favored (%) 93.0 92.8 89.6 88.9 89.3 89.2 
Outliers (%) 2.0 2.2, 2.3 2.5 23 2.3 
MolProbity 
Overall score 2.62 2.54 2.54 2.58 2.59 27 
Rotamer outliers (%) 1.8 0.8 0.9 1.0 1.1 1.1 
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Transport domain unlocking sets the 
uptake rate of an aspartate transporter 


Nurunisa Akyuz', Elka R. Georgieva*’, Zhou Zhou!, Sebastian Stolzenberg", Michel A. Cuendet’*, George Khelashvili’, 
Roger B. Altman’, Daniel S. Terry’, Jack H. Freed?*, Harel Weinstein®, Olga Boudker! & Scott C. Blanchard’® 


Glutamate transporters terminate neurotransmission by clearing synaptically released glutamate from the extracellular 
space, allowing repeated rounds of signalling and preventing glutamate-mediated excitotoxicity. Crystallographic 
studies of a glutamate transporter homologue from the archaeon Pyrococcus horikoshii, Gltp,, showed that distinct 
transport domains translocate substrates into the cytoplasm by moving across the membrane within a central tri- 
merization scaffold. Here we report direct observations of these ‘elevator-like’ transport domain motions in the 
context of reconstituted proteoliposomes and physiological ion gradients using single-molecule fluorescence reso- 
nance energy transfer (smFRET) imaging. We show that Gltp, bearing two mutations introduced to impart character- 
istics of the human transporter exhibits markedly increased transport domain dynamics, which parallels an increased 
rate of substrate transport, thereby establishing a direct temporal relationship between transport domain motion and 
substrate uptake. Crystallographic and computational investigations corroborated these findings by revealing that the 
‘humanizing’ mutations favour structurally ‘unlocked’ intermediate states in the transport cycle exhibiting increased 


solvent occupancy at the interface between the transport domain and the trimeric scaffold. 


Glutamate transporters, also termed excitatory amino acid transpor- 
ters (EAATs), maintain glutamate concentration gradients across the 
cell membrane by coupling neurotransmitter uptake to symport of three 
sodium (Na‘) ions and proton and counter-transport of a potassium 
ion’. Structural information on the EAAT family principally stems from 
investigations of Gltp,?° an aspartate and Na’ symporter”* from Pyro- 
coccus horikoshii. Crystal structures of Gltp, revealed that the homo- 
trimeric protein is composed of a rigid, central trimerization scaffold 
that supports three peripheral transport domains containing the sub- 
strate binding sites (Fig. 1a). Comparison of Gltp, structures captured 
in distinct conformations suggests that within the trimerization scaf- 
fold, individual transport domains undergo relocations approximately 
15 A perpendicular to the membrane, providing substrate and ions al- 
ternating access to the extracellular (outward) and intracellular (inward) 
solutions (Extended Data Fig. 1a)”. 

Single-molecule imaging of Gltp, provided direct evidence for bidi- 
rectional elevator-like transport domain motions”'®. Consonant with 
findings obtained using double electron-electron spin resonance (DEER) 
spectroscopy’, these measurements also showed that individual Gltp;, 
transport domains transition spontaneously between outward- and 
inward-facing conformations both when free of cargo (apo) and when 
bound to substrates. These transport domain motions exhibited het- 
erogeneous dynamic behaviours, alternating between periods of rapid 
transitions and periods of quiescence, where the protein rests in either 
outward- or inward-facing states °. In contrast to observations in struc- 
turally unrelated neurotransmitter sodium symporters’’, substrate bind- 
ing decreased transport domain dynamics in Gltp, by favouring the 
quiescent periods such that the frequency of domain motions con- 
verged to the substrate uptake rate”’. 

These findings led to the hypothesis that Gltp, configurations ob- 
served in crystal structures**, showing tight lock-and-key interactions 


between transport and trimerization domains, represent quiescent locked 
states with high substrate affinity, whereas the short-lived states sam- 
pled during dynamic periods are structurally distinct and likely have 
intrinsically lower substrate affinity (Extended Data Fig. 1b). This model 
posits that transport domain motions require a rate-limiting, structural 
unlocking process that changes the interface between the transport and 
trimerization domains, probably enabling solvent penetration into that 
interface’. 

To assess the relationship between Gltp, function, dynamics and 
structure, we employed smFRET imaging in the context of reconsti- 
tuted proteoliposomes with physiological ion gradients. We compared 
wild-type Gltp, to a gain-of-function, humanized (H) mutant R276S/ 
M395R (Ho76,395-Gltpp), which through unknown mechanisms exhi- 
bits a faster rate of substrate uptake’’. The smFRET experiments revealed 
that the mutations destabilized quiescent locked states. The resulting 
increase in dynamics paralleled a decreased affinity for substrate and an 
increased transport rate. Crystallographic analyses supported this ob- 
servation, showing that the transport domains of H76,395-Gltp, can 
adopt inward-facing conformations in which the transport domain- 
trimerization scaffold interface is strikingly more open than previously 
observed. Computational modelling further suggested that increased 
solvation by lipid or detergent hydrophobic tails in this interface prob- 
ably facilitates the formation of such conformations. These observa- 
tions provide a structural rationale for functional distinctions between 
Gltp, and the human EAATSs, and establish a kinetic framework for un- 
derstanding how regulation can be achieved. 


Experimental design 

Gltp), is a structural homologue of EAATs (~35% sequence identity)? 
that preferentially transports aspartate over glutamate, with higher sub- 
strate binding affinity and slower uptake rate*”. It has been suggested** 
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Figure 1 | Transport rates and ‘elevator-like’ domain dynamics are 
correlated. a, Surface representation of the outward-facing Gltp, showing the 
transport (blue) and scaffold (beige) domains. In one protomer, HP1 (yellow) 
and TM8 (dark blue) are emphasized as cartoons. In the enlarged substrate- 
binding site (right), mutated residues and aspartate are shown as sticks and 
ions are shown as spheres. b, Sequence alignment for HP1 and TM8 with 
mutation sites highlighted in pink. c, Aspartate uptake by unlabelled (black) 


that these distinctions may stem, in part, from the differential location 
ofa conserved arginine residue’ that is proximal both to the substrate- 
binding site and the transport domain-trimerization scaffold interface. 
Although the location of this arginine can differ in the primary sequences 
of glutamate transporter homologues, its position is conserved in most 
family members (Fig. la, b, and Extended Data Fig. 1c). In human 
EAAT1, moving this arginine from transmembrane segment (TM) 8 
to helical hairpin (HP) 1 (where it is located in Gltp,), strikingly in- 
creases substrate affinity and decreases uptake rate’*. Reciprocal muta- 
genesis of Gltp;,, whereby the arginine is moved from HP1 to TM8 (R276S/ 
M395R), reduces aspartate affinity and increases the transport rate’. 
We took advantage of this gain-of-function mutant to probe correla- 
tions between uptake rate and transport domain dynamics. 

We performed a comparative analysis of wild-type and H7¢,395-Gltp, 
proteins using smFRET, in which elevator-like transport domain mo- 
tions in individual Gltp, trimers, bearing one donor and one acceptor 
fluorophore were revealed as time-dependent changes in FRET efficiency* 
(Extended Data Fig. 2a, b). To investigate such motions in the context 
of proteoliposomes, we labelled Gltp, proteins with intramolecularly 
stabilized derivatives of the cyanine fluorophores Cy3 and Cy5 that ex- 
hibit intrinsically enhanced brightness and photostability’”"* (Extended 
Data Fig. 2c), eliminating the need for fluorophore protective agents 
that disturb lipid bilayer properties’. The labelled proteins were recon- 
stituted into liposomes in the absence of substrates for smFRET and bulk 
substrate uptake measurements. Bulk radioactive aspartate uptake exper- 
iments confirmed that both the labelled wild-type and H27¢,395-Gltppy 
mutant transported substrate with rates similar to those of the unla- 
belled proteins, the mutant being about fourfold faster than the wild 


type (Fig. 1c). 
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and labelled (red) transporters. Substrate uptake data are shown as averages of 
at least three experiments with error bars representing standard deviations. 

d, Proteoliposome attachment strategy. e, smFRET trajectories recorded for 
the wild type (WT) and mutant in proteoliposomes under transport conditions. 
f, Transition density plots corresponding to e. The average transition 
frequencies and the number of total transitions (N,) are shown. Colour 

scale is from tan (lowest) to red (highest frequency). 


For smFRET measurements, reconstitution procedures were estab- 
lished to yield maximally one Gltp; trimer per vesicle. Proteoliposomes 
were immobilized via biotinylated, fluorescently labelled Gltp, within 
passivated quartz microfluidics chambers activated with a biotin- 
streptavidin bridge (Fig. 1d). Using this strategy, only those proteoli- 
posomes containing Gltp), oriented with the extracellular side facing the 
vesicle exterior were immobilized and imaged”. Imaging experiments 
were initiated in the absence of substrates under isoelectric conditions 
and chemical gradients were established by rapidly exchanging the 
proteoliposomes into a buffer containing Na” ions and aspartate. Addi- 
tionally, we examined the behaviours of the labelled proteins in deter- 
gent micelles that afford higher signal-to-noise ratios and increased 
sample size. 


Transport rate and dynamics are correlated 

In both the absence and presence of gradients, wild-type Gltp,, in pro- 
teoliposomes showed spontaneous transitions between low-, inter- 
mediate- and high-FRET efficiency states centred at ~0.4, ~0.6 and 
~0.9, respectively (Fig. le, fand Extended Data Fig. 3). In detergent solu- 
tions, these FRET states were assigned to specific Gltpy, configurations: 
the low-FRET state reflects symmetric outward-facing and asymmet- 
rically outward- and inward-facing configurations; intermediate- and 
high-FRET states reflect, respectively, asymmetrically inward- and 
outward-facing and simultaneously inward-facing protomers (Extended 
Data Fig. 2)’. In line with previous investigations”'””, population FRET 
data from hundreds of individual proteins in the absence of gradi- 
ents show that the transporter occupies the outward-facing, low-FRET 
state about half of the time in both detergent (46%) and lipid vesicles 
(54%) (Fig. 2a, Extended Data Fig. 3b and Extended Data Table 1a, b). 
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Figure 2 | Ligand-dependent state 


WT WT R276S/M395R R276S/M395R oe ee 
Na*/Asp bound Na*/TBOA bound Apo Na*/Asp bound Na*/TBOA bound distributions in detergent. a, C, In 
n= 623 n= 346 n= 355 each panel, time-dependent FRET 
: ___ efficiency population contour plots 
‘ (left) and cumulative population 
‘ ‘ histograms (right) are shown for the 
wild type (a) and mutant (c). 
Experimental conditions are 
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Transitions between low- and higher-FRET states reflect elevator-like 
movements of the individual transport domains between outward- and 
inward-facing configurations, respectively’. In proteoliposomes, such 
transitions occurred at a rate of ~0.2 s_', roughly twofold less frequently 
than in detergent (Fig. 2a and Extended Data Fig. 3c). Paralleling the 
effects of substrate binding to wild-type Gltp, in detergent (Fig. 2b), a 
modest population shift towards the outward-facing, low-FRET state 
occurred under active transport conditions achieved by addition of Na* 
and aspartate (Extended Data Fig. 3b and Extended Data Table 1a, b). 
Na” and aspartate also reduced transport domain dynamics by tenfold 
to ~0.02s ' (Fig. 1fand Extended Data Fig. 3c). Thus, in the presence 
of chemical gradients, the frequency of transitions from outward- to 
inward-facing state (~0.01 s ') mirrored the rate of radioactive sub- 
strate uptake (~0.007 s') (Fig. 1c). 

Notably, the H27¢,395-Gltp, mutant only exhibited transitions be- 
tween low- (0.4) and a single, higher- (0.65) FRET state in both pro- 
teoliposomes and detergent (Figs 1f and 2c and Extended Data Fig. 3). 
Similar to the wild-type protein in the absence of chemical gradients, 
the low-FRET state was occupied 60% of the time in detergent micelles 
and 40% in proteoliposomes (Extended Data Table 1a, b). The observed 
FRET transition frequency for H37¢,395-Gltp, was also two times slower 
in proteoliposomes (~0.13 s') compared to detergent (~0.22 s') 
(Fig. 2d and Extended Data Fig. 3c). 

However, in stark contrast to the wild-type protein, the transition 
frequency in H27¢,395-Gltpp, decreased by less than twofold to ~0.1 s~ 
when transport-supporting chemical gradients were established (Fig. le, f 
and Extended Data Fig. 3c). Here again, the frequency of transitions 
from the outward- to inward-facing FRET state (~0.05s_ ') converged 
to the measured rate of substrate uptake (~0.03 s — 9) (Fig. 1c). The quan- 
titative correspondence observed between the rates of smFRET transi- 
tions and uptake for the wild-type and H)76,395-Gltp, mutant proteins 
provides compelling evidence that elevator-like motions of transport 
domains mediate solute uptake and are critical steps of the transport 
cycle’. This finding was independent of the proteoliposome immob- 
ilization strategy used and valinomycin-mediated electrical potentials 
(Extended Data Fig. 4a—-d)””. 


H76,395-Gltpy Visits distinct inward-facing states 


In contrast to wild-type Gltp,,, which samples intermediate- (0.6) and 
high- (0.9) FRET states, H276,395-Gltp, samples only a single higher- 
FRET (0.65) configuration (Extended Data Fig. 3). No excursions into 
the 0.9 FRET state were observed even when data were collected at 
sixfold higher time resolution (15 ms) (Extended Data Fig. 4e, f). The 
absence of the 0.9 FRET state would be expected if only one protomer 
within the H27¢,395-Gltpy, trimer transitioned into inward-facing con- 
figuration at a time, while the formation of symmetric inward-facing 
states were disallowed. This model is, however, inconsistent with data 
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analysed. b, d, Corresponding 
transition density plots (as in Fig. 1). 


showing that individual transport domains function independently*'*”", 


Analternative hypothesis is that the inward/outward and inward/inward 
configurations in H76,395-Gltp}, exhibit altered, overlapping FRET values. 
If this model is correct, then the gain-of-function mutations in H27¢,395- 
Gltp, have altered the nature of the elevator-like transport domain 
motions and the structure of the inward-facing state. 


The energy landscape of H276,395-Gltpy is altered 


Na* and aspartate significantly stabilized the higher-FRET state of 
H76,395-Gltpy, in detergent micelles (Fig. 2c and Extended Data Table La, b). 
In detergent, Na‘ and aspartate have access to both the extracellular 
and cytoplasmic sides of the protein. Assuming that a binding equilib- 
rium is established in each conformation, these observations suggest 
that substrates bind more tightly to the inward-facing H27¢,395-Gltpp, 
conformation. Such a response was not observed for the wild-type Gltp, 
where substrate affinities of the inward- and outward-facing conforma- 
tions are nearly the same” and ligands stabilize the latter only slightly” 
(Fig. 2a and Extended Data Table 1). Notably, the transporter blocker 
DL-threo-B-benzyloxyaspartate (TBOA)” stabilized the outward-facing 
low-FRET states of both wild-type and H 76 395-Gltp, (Fig. 2 and Ex- 
tended Data Fig. 5a, b). As above, this suggests that TBOA preferentially 
binds to the outward-facing state of both isoforms. Results consistent 
with these findings were obtained from ensemble DEER measurements 
using the protein spin-labelled on the same residue (Extended Data 
Fig. 5c). 

Interestingly, the addition of Na‘ and aspartate to H276,395-Gltpp, 
proteoliposomes led to an increase in the outward-facing, low- FRET 
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Figure 3 | Coupled Na‘ and aspartate binding to H276,395-Gltpp. 

a, Populations of low- (left) and higher- (right) FRET states determined for 
H276,395-Gltpy in detergent micelles as functions of Na‘ concentration in the 
presence of 0 (blue), 20 (black) and 100 tM (red) aspartate. Solid lines are fits 
to Hill equation with Kg = 200, 30 and 15 mM, respectively, and n value 

of 3. The data points shown are averages and standard errors from three 
independent biological replicates. b, Logarithmic plots of aspartate Ky values 
as functions of Na‘ concentrations. Data are from isothermal titration 
calorimetry (ITC) (black) and smFRET (grey). The solid line through the 
data are a linear fit with slope 3.2. The extent of coupling between Na* and 
aspartate binding is similar to wild type (dashed line)”’. 
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Figure 4 | Unimodal dynamic behaviour of H276,395-Gltp,._ a-d, Dwell 
time distributions (a, c) and average dwell times (b, d) for the low- (low, solid 
lines) and intermediate- and high-FRET states (int + high, dashed lines) 
obtained for the wild type (a, b) and H»76,395-Gltpy, (c, d) in detergent. The 
distributions for apo (blue) and Na* /aspartate-bound proteins (red) were 
fitted to a probability density function. The fitted time constants are in 
Extended Data Table 1c. Average dwells are plotted as functions of Na* 
concentration in the presence of 10 and 100 UM aspartate for wild type and 
H76,395-Gltpp, respectively. Solid lines are fits to Hill equation with 

Ky = 15mM and n = 3.2 for wild type and Kg = 19mM and n = 3.2 for 
H276,395-Gltpp. The data points shown are averages and standard errors from 
three independent biological replicates. 


population (Extended Data Fig. 3). This liposome-specific response to 
substrate addition provides supporting evidence for bilayer integrity. 
It also reveals that elevator-like transport domain movements—as 
opposed to substrate release—are rate-limiting in the H27¢,395-Gltp, 
transport cycle. If substrate release were slow compared to the domain 
movements, the state distributions during uptake would mirror those 
observed in detergent, that is, show preference for the inward-facing 
higher-FRET state. 

The effect of substrate on the distribution of FRET states observed 
for both isoforms was concentration-dependent in detergent micelles. 
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Notably, H276,395-Gltp), exhibited an ~ 1,000-fold increase in apparent 
substrate dissociation constant (Kg) compared to the wild-type pro- 
tein (Fig. 3 and Extended Data Fig. 6a). This finding was corroborated 
by bulk measurements (Extended Data Fig. 6b, c). Hence, the Hy76,395- 
Gltp, mutations affect both transport domain dynamics and substrate 
affinity, even though neither of the mutated residues coordinates as- 
partate directly in the existing crystal structures. These observations 
support the hypothesis that substrate binding and transport domain 
dynamics are physically coupled. 


Locked states are destabilized in H76,395-Gltpp 


The coexistence of quiescent and dynamic periods evidenced both in 
the absence and presence of ligands is a hallmark kinetic feature of 
wild-type Gltp,’. Binding of Na‘ and aspartate increases the preval- 
ence of quiescent periods and thus the average FRET state lifetimes 
(Fig. 4a, b). Strikingly, no evidence was found for quiescent periods in 
the H276,395-Gltp, mutant (Fig. 4c) and rapid transport domain dy- 
namics persisted even in the presence of saturating substrate concen- 
trations (Fig. 2d and Extended Data Table 1c). These dynamic processes 
were efficiently blocked by TBOA (Fig. 2b, d) consistent with their pu- 
tative role in substrate transport. In H76,395-Gltp,, substrate binding 
increased the lifetime of the high-FRET state (~sevenfold), with no 
detectable impact on the low-FRET state lifetime (Fig. 4d). In both the 
absence and presence of ligands, the low- and higher-FRET state life- 
times were unimodal (Fig. 4c, d and Extended Data Table 1c). These 
findings suggest that in H276,395-Gltp}, the isomerization steps leading 
to locked configurations of the wild-type protein are strikingly altered 
or inaccessible under the conditions examined, although an allosteric 
coupling between substrate binding and stabilization of the domain 
interface still exists. 


Structure of the inward-facing H276,395-Gltpp 


To probe the underpinnings of the altered properties of H276,395-Gltpy, 
we determined a crystal structure of the protein bound to Na’ ions and 
aspartate at a moderate resolution of ~4.5 A (Extended Data Fig. 7). As 
expected from smFRET experiments (Fig. 2c), the structural model 
clearly showed that all protomers in the trimer spontaneously adopted 


Figure 5 | Crystal structure of the H276,395-Gltpy,. 
a, Single protomers of inward-facing wild type 
(left), locked mutant (centre) and unlocked mutant 
(right) in surface representation, coloured as in 
Fig. 1, HP2 is red. Residues 276 and 395 are 
coloured by atom type. The approximate limits of 
the hydrocarbon layer of the membrane are shown 
as dashed lines. b, Substrate binding sites 
(enlarged) viewed from the cytoplasm. HP1 and 
HP2 are in cartoon representation; aspartate 
(black) and residues 276, 395 and Asp 394 
(coloured by atom type) are emphasized as spheres. 
Arrowhead (cyan) marks the region of increased 
solvent accessibility. c, Cytoplasmic view of the 
unlocked protomer showing the crevice at the 
domain interface. Dashed line replaces TM2-TM3 
loop for clarity. Arrows indicate regions of 
increased water and lipid accessibility. Open 
conformations of HP2 were modelled based on the 
TBOA-bound (green) structure of Gltp)’. 
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inward-facing configurations. The model also revealed that the trans- 
port domain’s orientations differed from those previously captured in 
Gltp, structures, both with and without stabilizing crosslinks*’. More- 
over, the trimer was asymmetric, with the transport domain of proto- 
mer A occupying a position distinct from the other two. 

In protomer A, the transport domain shifted further inward by 2 A 
and rotates by 7° around an axis roughly perpendicular to the mem- 
brane plane with respect to the wild type (Fig. 5a). This rearrangement 
is accommodated by a concerted movement of helices in the scaffold 
domain, comprising TM1 and peripheral portions of TM2 and TM5 
(Extended Data Fig. 7c), whose flexible nature was already noted’. This 
conformation resembles the inward-facing, locked state of the wild type*” 
in the close packing observed between the transport domain and the 
trimerization scaffold (Extended Data Fig. 7d). Molecular dynamics 
simulations revealed that whereas Arg 276 in the wild type forms hy- 
drogen bonds with Asp 394 and bulk water molecules, the corresponding 
Arg 395 in H76395-Gltp, faces the hydrophobic core of the bilayer. 
The resulting membrane remodelling is driven by the hydrophobic 
matching force**”®, and is established by interactions of the Arg 395 
side chain with penetrating lipid phosphate groups and accompanying 
water molecules (Extended Data Fig. 8). Consequently, the penetrat- 
ing polar moieties are positioned in an otherwise hydrophobic region 
of H276,395-Gltp},, which can destabilize the inward-facing, locked con- 
formation and increase water accessibility to the substrate-binding site 
and to the domain interface (Fig. 5b). 

In protomers B and C, the transport domains undergo identical and 
more striking changes (Fig. 5a), each swinging away from the trimer- 
ization scaffold by about 12° compared to locked protomer A. Conse- 
quently, a large crevice opens between HP2 and the scaffold, reducing 
the interface between the transport and scaffold domains from ~ 1,300 a? 
to ~900 A” and allowing access to water, detergent or lipid molecules 
(Fig. 5c). This unusual, apparently unlocked, conformation was ob- 
served in two protomers occupying distinct crystal packing environ- 
ments and therefore seems to be determined by the properties of the 
protein itselfand not by crystal contacts. The crevice it generates is largely 
hydrophobic, and closes rapidly in molecular dynamics simulations 
when solvated only by water (Extended Data Fig. 9a—c). In contrast, the 
open interface between transport and trimerization domains is stable 
with lipids positioned in this space (Extended Data Fig. 9 d—g), suggest- 
ing that solvation by lipid or detergent molecules, is necessary. Notably, 
this crevice may allow HP2, whose gating role in the outward-facing 
state is well-established**, to open when the transport domain is inward 
facing (Fig. 5c). If so, the substrate release might be facilitated in the 
unlocked conformation, a notion compatible with the markedly re- 
duced substrate affinity of this mutant. 
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Discussion 


Conformational transitions between outward- and inward-facing states 
are key events in transport cycles of secondary active transporters’””*. 
In glutamate transporters and possibly other families””-*’, such transi- 
tions involve elevator-like movements of the substrate-binding domains 
supported by relatively rigid scaffold domains. The frequency of such 
transitions in Gltpy, in lipid bilayers and in the presence of physiological 
ionic gradients parallels the turnover rate of substrate uptake. This rela- 
tionship also holds in a gain-of-function mutant H76 395-Gltp), that ex- 
hibits a 1,000-fold decreased substrate affinity and a fourfold faster 
uptake rate. Collectively, our observations establish a direct correlation 
between the transport domain movements and substrate transport, and 
suggest an inverse relationship between substrate affinity and transport 
domain motions. The H37¢,395-Gltp;, mutant is special in this regard, as 
other point mutations impact dynamics only and do not potentiate 
transport’. 

The observed dynamic signatures strongly suggest that the rate- 
limiting step in this process is the unlocking of the transport domain 
from the trimerization scaffold (Fig. 6a). Although both the wild-type 
and the H37¢,395-Gltpy, proteins exhibit similar transport domain struc- 
tures and translocate similarly positioned charged groups (including 
Arg 276 in the wild type and Arg 395 in the mutant), locked states are 
relatively unstable in the H27¢,395-Gltp, mutant, leading to overall faster 
dynamics and uptake. 

The locked and unlocked configurations of wild-type Gltp,, corre- 
sponding to quiescent and dynamic periods, respectively, coexist and 
interconvert spontaneously, which suggests that outward- and inward- 
facing states of Gltp,—and by extension EAATs—should be viewed as 
structurally heterogeneous ensembles. Increased quiescent period du- 
rations in the presence of substrate further suggest that ligand binding 
is allosterically coupled to the formation of locked states’. Based on these 
insights, we propose a simplified kinetic framework for the transport 
cycle that recapitulates the most salient experimentally observed fea- 
tures (Fig. 6b, Extended Data Fig. 10). The specific relationship of crys- 
tallographic snapshots of Gltp;, and related proteins to the topological 
features of this framework will need to be examined carefully. 

The structure of H76,395-Gltp, (Fig. 6c) captures an unlocked con- 
figuration that appears relevant to the proposed transport cycle and 
uniquely suitable for ligand binding and release. Although the mole- 
cular basis of how the mutations in H276,395-Gltp,, affect the locked- 
unlocked isomerization requires further investigation, molecular dynamics 
simulations suggest that protein-lipid interactions are pivotal (Ex- 
tended Data Fig. 9). The proposed role for the lipid hydrophobic tails 
in facilitating domain unlocking complements previous hypotheses that 
transient interface hydration facilitates transport domain translocation*"*. 
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That two closely related Gltp,, isoforms exhibit distinct kinetic and 
structural signatures foreshadows the possibility that human EAATs 
differ substantially from Gltpy, especially in their dynamic properties. 
Probing EAATs directly is therefore essential, particularly since the 
extent to which they might be diverted to kinetically stable, potentially 
off-pathway states may represent a regulatory modality. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Sample size. No statistical methods were used to predetermine sample size. 
DNA manipulations, protein expression, purification and labelling. Single cys- 
teine mutations were introduced by site-directed mutagenesis (Stratagene) of a 
cysteine-less Gltp, background, in which seven non-conserved residues had been 
replaced with histidines resulting in improved expression levels (termed Gltp,, here 
for brevity)’. Constructs were verified by DNA sequencing and transformed into 
E. coli DH10-B cells (Invitrogen). Proteins were expressed as C-terminal (His) fu- 
sions as described previously’. Briefly, isolated cell membranes were re-suspended 
in buffer A, containing 20 mM HEPES/NaOH, pH 7.4, 200mM NaCl, 0.1mM 
L-aspartate, 0.1 mM Tris(2-carboxyethyl)phosphine (TCEP). Membranes were solu- 
bilized in the presence of 40 mM n-dodecyl B-p-maltopyranoside (DDM) for 1h 
at 4 °C. Solubilized transporters were purified by metal-affinity chromatography 
in buffer A supplemented with 1 mM DDM and eluted in 250 mM imidazole. The 
(His)g-tag was cleaved by thrombin and proteins were further purified by size- 
exclusion chromatography (SEC). For smFRET experiments, protein samples at 
40 1M were labelled with a mixture of maleimide-activated Cy3 and Cy5 dyes that 
exhibit enhanced photostability'”*” as well as biotin-PEG,,, at concentrations of 
50, 100 and 25 uM, respectively, for 30 min at room temperature. Labelled proteins 
were purified away from the excess reagents by SEC. Their purity and specificity of 
labelling were assessed by SDS-PAGE, which was followed by fluorescence imag- 
ing and Coomassie staining. 

Protein reconstitution into liposomes for smFRET analysis and transport 
assays. Labelled and unlabelled Gltp, variants were reconstituted into liposomes 
as previously described*"’. Briefly, liposomes, prepared from 3:1 (w/w) mixture of 
E. coli total lipid extract and egg yolk phosphotidylcholine (Avanti Polar Lipids) 
in a buffer containing 20 mM Tris/HEPES, pH 7.4 and 100 mM KCI, were desta- 
bilized by addition of Triton X-100 at a detergent to lipid ratio of 0.5:1 (w/w). For 
reconstitution, proteins were added to lipids at final protein to lipid ratio of 1:1,000 
(w/w) and incubated for 30 min at room temperature. Detergents were removed by 
repeated incubations with Biobeads as described'!. For smFRET and radioactive 
substrate uptake experiments, the same proteoliposomes were extruded through 
100 nm and 400 nm filters, respectively. This reconstitution strategy yields at most 1 
and 16 Gltpy, trimers per vesicle, respectively. Radioactive substrate uptake was mea- 
sured as previously described’. Briefly, proteoliposomes were diluted into reaction 
buffer containing 20 mM Tris/HEPES, pH 7.4, 100 mM NaCl and 0.3 »M PHIL- 
aspartate at room temperature. Aliquots were removed at appropriate times, diluted 
in ice-cold quenching buffer (20 mM Tris/HEPES, pH 7.4, 100 mM LiCl) and fil- 
tered through 0.22 1m filters (Millipore). Protein concentration was estimated by 
the absorbance at 280 nm after correcting for the fluorophore contributions to the 
value. The amount of substrate uptake was normalized per mole of Gltp, monomers. 
smFRET experiments. All experiments were performed using a home-built, prism- 
based total internal reflection fluorescence microscope constructed around a Nikon 
TE2000 Eclipse inverted microscope body using streptavidin-coated, passivated 
microfluidic imaging chambers”. Except when stated otherwise, labelled proteins 
(either detergent solubilized or liposome-reconstituted) were surface-immobilized 
via a biotin-streptavidin bridge. Except when stated otherwise, imaging experiments 
were performed in a buffer containing: 20 mM HEPES/Tris (pH 7.4), 5 mM BME, 
an enzymatic oxygen scavenger system comprising 1 U ml’ glucose oxidase (Sigma), 
8Uml ' catalase (Sigma) and 0.1% glucose**. In addition, apo-Gltp, experiments 
included 200 mM KCl, Na* /Asp-bound experiments included 200 mM NaCl and 
0.1 mM aspartate and Na*/TBOA-bound experiments included 200 mM NaCl, 
10 mM TBOA. For experiments in detergent micelles, the buffers were also sup- 
plemented with 1 mM DDM. For imaging under transport conditions, the experi- 
ments were initiated in the absence of substrates (apo condition) on both sides of 
the membrane and chemical gradients were established by rapidly exchanging the 
proteoliposomes into an uptake buffer containing 100 mM NaCl and 100 uM as- 
partate. All data were collected at an imaging rate of 10s _' (100 ms integration 
time), except when otherwise stated. Fluorescence trajectories were selected for 
analysis using custom-made software implemented in Matlab (Mathworks) accord- 
ing to the following criteria*: a single catastrophic photobleaching event; over 8:1 
signal-to-background noise ratio; a FRET lifetime of at least 5 s. FRET trajectories 
were calculated from the acquired intensities, Icy3 and Icys, using the formula 
FRET = Icys/(Icy3 + Icys). Population contour plots were constructed by super- 
imposing the FRET data from individual traces. Histograms of these population 
data were fit to Gaussian functions in Origin (OriginLab). The relative populations 
and dwell time distributions of each FRET state, as well as the transition frequen- 
cies between them, were obtained by idealizing the smFRET traces using QuB”. 
Transition density plots and the dwell time survival plots were plotted and fitted as 
described previously’*. The logarithmic histograms of the dwell times were fitted to 
transformed probability density functions’. Over 300 molecules are included in each 
smFRET experiment to ensure that the experimental margin of error in the mean 
value of each distinct FRET state across the three experiments is less than 5%. 


Crystallography. The R276S/M395R Gltp;, mutant was purified by SEC in buffer 
containing 10 mM Tris/HEPES, pH 7.4, 100mM NaCl and 7 mM n-decyl-B-p- 
maltopyranoside (DM). Protein solution at 3.5 mg ml~ | was mixed at 1:1 (v:v) ratio 
with the reservoir solution, containing 50 mM sodium acetate, pH 5.6-6, 18-20% 
PEG 400 and 100-150 mM magnesium acetate, and crystallized at 4 °C by hanging- 
drop vapour diffusion. Crystals were cryoprotected in reservoir solution. Diffraction 
data were collected at National Synchrotron Light Source beamline X29. Diffraction 
data were indexed, integrated and scaled using the HKL2000 package”*. Anisotropy 
correction was applied as described previously’. Further analyses were performed 
using CCP4 programs”. Initial phases were determined by molecular replacement 
in Phaser” using transport and trimeric scaffold domains as separate search mod- 
els. The model was optimized by rounds of manual rebuilding in Coot* and refine- 
ment in Refmac5* with TLS’. During refinement, strict non-crystallographic 
threefold symmetry constraints were applied to the three transport domains and 
to regions of the scaffold domain that are involved in trimerization interactions. In 
addition, strict twofold symmetry constrains were applied to the entire B and C 
protomers, which exhibited identical positions of the transport domain. For the 
outward- and inward-facing states, published coordinates were used with accession 
numbers 2NWX (ref. 3) and 3KBC (ref. 4), respectively. For the open conformation 
of HP2 the accession number of the coordinates is 4OYF (ref. 6). All structural 
renderings were generated using PYMOL”. 

DEER measurements and data analysis. Measurements were performed at 60 K 
using a 17.3 GHz home-built Ku-band pulse spectrometer“. A standard four-pulse 
DEER sequence with 1/2-n-1 pulse widths of 16 ns, 32 ns and 32 ns, respectively, 
and a 32 ns x pump pulse was used routinely. The frequency separation between 
detection and pump pulses was 70 MHz. The detection pulses were positioned at 
the low-field edge of the nitroxide spectrum. The homogeneous background was 
removed from the raw time-domain signals and the distances were reconstructed 
from the baseline-corrected and normalized signals by using Tikhonov regulariza- 
tion method’ and refined by maximum entropy method“. 

Molecular modelling. Molecular dynamics simulations using the Charmm27 
force field (FF)*° and updated lipid FF*° were prepared as described previously'* 
and run using the NAMD 2.9 (ref. 47) software at 300K with PME electrostatics 
and standard parameters for the Charmm FF. Atomic coordinates for the inward- 
facing wild-type Gltp, were taken from PDB entry 3KBC (ref. 4) Simulations with 
the Gromos 54a7 FF** were prepared using the LAMBADA / InflateGRO mem- 
brane-embedding protocol and run with the Gromacs 4.6.1 (ref. 50) simulation 
package with reaction-field electrostatics and standard cutoffs for the Gromos FF. 
All simulations included pure POPC membranes, except Charmm27 Trajectory 3 
(Extended Data Fig. 9), which contained a mixture of 18% POPC, 52% POPE, and 
30% POPG (prepared with Charmm-GUI web-tool’’), more similar to the compo- 
sition of the liposomes used in experiments. In selected simulations (Extended Data 
Fig. 9), backbone Co, atoms were subjected to harmonic restraint potentials centred 
on positions from the X-ray structure with a harmonic constant of 0.1 kcal mol! A~? 
(NAMD) or 0.24 kcal mol 1 A~? (Gromacs). Docking of detergent and POPC lipid 
molecules was performed with Autodock Vina” within the Chimera 1.8 visualiza- 
tion software’’. Lipid insertion in Charmm27 Trajectory 3 was performed as fol- 
lows: (i) a frame from the molecular dynamics trajectory after 48ns of simulation 
time was selected; (ii) several lipid molecules restricted to various regions of the in- 
terfaces in protomers A and C were docked, ignoring the water; (iii) docking poses 
among the highest ranked from all docking runs were combined, such that lipid 
molecules fill the available hydrophobic pockets without clashing with each other, 
and overlapping water molecules were discarded; (iv) local minimization was per- 
formed with the Charmm27 force field, including solvent and side chains within 
5 A of inserted lipids; (v) the molecular dynamics simulation was restarted at 300 K. 
Data processing and plots were performed in Matlab (Mathworks). 

Kinetic simulations of smFRET data. For the simulations, we assumed that pro- 
tomer motions are independent. The model presented in Fig. 6 was employed to 
simulate the motions of individual protomers between outward- and inward-facing 
orientations in QuB**. The time-dependent configurations of two protomers were 
then assigned to FRET states as described (Extended Data Fig. 3). FRET traces were 
generated at 100 ms time-resolution in Matlab (Mathworks) using a Gaussian dis- 
tribution of FRET efficiency values and widths derived from our experimental data. 
Initial estimates of the kinetic parameters were based on exponential fits of the 
experimental dwell time distributions (Extended Data Table 1c). The parameters 
were then manually optimized to recapitulate the experimental observables”’: pop- 
ulation FRET histograms, TDPs and the dwell-time histograms (Extended Data 
Fig. 10). 
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Extended Data Figure 1 | Elevator model of transport and spatial 
conservation of a positively charged residue in glutamate transporter 
family. a, Gltp, protomers in the outward- (left) and inward-facing (right) 
conformation are shown in surface representation and viewed in membrane 
plane. Dashed lines represent an approximate position of the membrane 
hydrocarbon layer. In the inward-facing state, the transport domain (blue) is 


moved by ~15 A across the bilayer relative to the trimerization domain (beige). 


b, Schematic representation of dynamic mode-switching between stable and 


Position Frequency (%) 


276 24 
356 1 
357 6 
391 40 
395 9 
398 1 


No R/K/H 20 


transient conformations. c, A single Gltp, protomer is shown in cartoon 
representation. Cyan balls emphasize the amino acid positions at which 
potentially positively charged residues occur in glutamate transporter 
homologues. d, Occurrence frequencies of these residues at the marked 
positions (Gltp, numbering). To obtain the frequencies, sequences were 
harvested from the PFAM database™ (accession code PF00375). Sequences 
were parsed to exclude those with over 70% identity and aligned using 
Clustal Omega*’. 
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Extended Data Figure 2 | Assignment of FRET efficiency states. a, Shown 
are the crystal structures of Gltp;, trimers in symmetrical outward (OF)- and 
inward (IF)-facing states and a model of an asymmetric configuration with two 
outward- and one inward-facing protomers~*. The structures are shown in 
surface representation and coloured as in Extended Data Fig. 1. Black lines 
connect Co atoms of residue 378, and the corresponding distances are indicated 


Inward-facing (IF) 
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Mixed States (OF/IF- IF/OF) 
(5A / 48A) 


Cc 


above the structures. b, Expected FRET efficiency levels for these distances for 
all possible configurations of subunit pairs: outward/outward (OF/OF), 
outward/inward (OF/IF), inward/outward (IF/OF) and inward/inward 
(IF/IF)*. ¢, Intramolecularly stabilized 4S(COT)-maleimide Cy3 (n = 1) and 
Cy5 (n = 2) fluorophores used in this study synthesized as described 
previously'”'* with the addition of two sulfonate groups for increased solubility. 
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Extended Data Figure 3 | Conformational state distributions of wild-type 
and H276,395-Gltp, in proteoliposomes. a, Examples of smFRET recordings. 
Top panels show raw fluorescent signals originating from donor (green) 

and acceptor (red) dyes. Bottom panels show changes of FRET efficiency 
calculated from raw data (blue). Red solid lines through the data are 
idealizations obtained using QuB software*. b, Contour plots and one- 
dimensional population histograms in the absence and presence of Na* and 
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Na’, TBOA 0.44 (0.16) 0.7 (0.16) 0.88 (0.1) 
Apo 0.4 (0.18) 0.59 (0.18) 0.82 (0.17) 
Transport 0.42 (0.16) 0.63 (0.16) 0.83 (0.16) 
Apo 0.37 (0.22) 0.66 (0.2) 
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aspartate 0.46 (0.16) 0.71(0.15) 
Na’, TBOA 0.43 (0.18) 0.68(0.15) 
Apo 0.44 (0.22) 0.67 (0.2) 
Transport 0.42 (0.23) 0.67 (0.23) 


outside of the vesicles are shown above the panels. Wild-type and H27¢,395-Gltpy, 
histograms are fitted to three and two Gaussian functions, respectively. 

c, Transitions density (TD) plots for the wild type (left) and H276,395-Gltp, 
(right) in proteoliposomes in the absence of Na” and aspartate in the external 
buffer. d, Means and widths (in brackets) of FRET efficiency distributions 
derived from Gaussian fits to proteoliposome data in comparison to 
detergent data. 
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Extended Data Figure 4 | Single-molecule dynamics using different 
liposome-attachment strategies and with higher time-resolution. 

a-d, Dynamic properties of H276,395-Gltp, under transport conditions using a 
different surface-immobilization strategy and in the presence of electrical 
potential. a, Surface-immobilization strategy for proteoliposomes using His- 
tagged lipids. b, Transition frequencies for wild-type (top) and H276,395-GltPh 
(bottom) trimers reconstituted into his-tagged liposomes that were 
site-specifically labelled in just two protomers with intramolecularly 
photostabilized Cy3 and Cy5 fluorophores. c, A negative inside voltage 
potential was established in proteoliposomes by adding valinomycin to the 


Time (sec) 


uptake buffer. d, Transition frequencies for wild-type (top) and H276,395-GltPh 
(bottom) in the presence of valinomycin. Each experiment shown includes 
statistics based on >250 individual molecules. The standard error in transition 
frequency measurements is approximately 0.015s '. e, f, Dynamic properties 
of H376,395-Gltp, probed at 15 ms time resolution. Contour plots and one- 
dimensional population FRET efficiency histograms (e) observed for the 
humanized mutant in detergent solution in the absence (left) and presence 
(right) of 100 mM NaCl and 100 uM aspartate. Examples of single-molecule 
trajectories are shown in f. 
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Extended Data Figure 5 | Population changes in response to ligand binding. _ transporter (black) spin-labelled on residue Cys378 in detergent solution. The 
a, b, TBOA binding to H27¢,395-Gltp, measured in smFRET experiments. data were collected in the absence of ligands (top), in the presence of 100 mM 
Contour plots and population FRET efficiency histograms in the presence of | Na~ and 350|1M aspartate (middle) and in the presence of 100 mM Na 
increasing concentrations of TBOA (a). Changes in low- (red) and high- (blue) and 480 uM TBOA (bottom). The red arrows above the distance distributions 
FRET state populations as a function of TBOA concentration (b). Solid lines mark distances between residues 378 extracted from crystal structures of the 


through the data correspond to the Hill equation y = ymin + (Ymax — min) symmetric outward- (OF/OF) and inward- (IF/IF) facing states. The data for 
(x"/(x" + Kq")) with Ky = 2.4mM and n = 1. The data points shown are the wild-type transporter were adapted from a published study'’. The data show 
averages and standard errors from three independent biological replicates. that in the apo transporter, outward- and inward-facing states are similarly 

c, Experimental time domain DEER data (left) and reconstructed distance populated. Binding of Na” ions and aspartate favours the inward-facing state, 
distributions (right) for H276,395-Gltp, (shown in colours) and wild-type whereas binding of TBOA favours the outward-facing state. 
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Extended Data Figure 6 | Aspartate binding experiments. a, FRET efficiency 
population contour plots determined for H276,395-Gltp;, in detergent micelles in 
the presence of 100 IM aspartate and increasing concentrations of Na” ions 
(indicated above the panels). b, c, Representative aspartate binding isotherms 
derived from ITC experiments for the wild-type Gltp,, (b) and H276,395-Gltpy, 
(c) in the presence of 10 mM Na” and 100 mM Na’, respectively. The binding 
of aspartate to Hy76,395-Gltp,, in the presence of 10 mM Na” is too weak to 
measure (inset). Binding experiments were performed using small-volume 
Nano ITC (TA Instruments). Upper panels show raw data. The cell contained 


ARTICLE 


0% 


123 4 12 3 4 
Time (sec) Time (sec) Time (sec) 
R276S/M395R-Glt p, at 100 mM Na * 
Time (minutes) 
0 20 40 60 
0.00 4 
-0.04 4 
-0.08 4 
-0.124 
Time (minutes ) 
1 0 0 BD BD BD 
-0.16 5 eo] 
-0.204 2 
| go at 10 mM Nat 
-0.24 5 a1 
2] 
oJ as 
| 
ea 
-6-4 
el 
] Kp(Asp)=285 nM 
Nad | at 100 mM Na * 
242] 
144 
-16| ‘ 


T T 
0.0 0.2 0.4 0.6 0.8 1.0 1.2 


mol aspartate / mol transporter 


30 UM (WT-Gltp,) and 40 1M (Ho76,395-Gltp,) protein buffer containing 

20 mM HEPES/Tris, pH 7.4 and 0.1 mM DDM and indicated concentrations of 
NaCl. The syringe contained Asp at 200 1M concentration in the same buffer; 
every injection contained 5 1]. Data were processed and analysed using 
manufacturer’s software (lower panels). Solid lines through the data are fits to 
independent binding sites model with the following Kg, enthalpy (AH), and 
apparent number of binding sites (n): 380 nM, 15 kcal per mol and 0.65 for the 
wild-type transporter, and 285 nM, 16 kcal per mol and 0.68 for H276,395-Gltpp. 
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Extended Data Figure 7 | Data collection and refinement for Na* and 
aspartate bound H)76,395-Gltpy,, a, Table showing data collection and 
refinement statistics. Scaling and refinement statistics were obtained after 
anisotropy correction by ellipsoidal truncation using high-resolution cutoffs of 
4.9 A along the a and b axis, and of 4.2 A along the c axis. b, Stereoview of the 
2F,-F. electron density map for H276,395-Gltp, contoured at 1.5 o around 
residue Arg 395 in unlocked protomer C. Protein backbone (maroon) is shown 
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in cartoon representations and side chains are shown as lines and colored by 
atom type. ¢, Superimposed scaffold domains of the inward-facing wild type 
and H76,395-Gltp, are shown in cartoon representation. The labile portions 
are coloured cyan (wild type) and magenta (mutant). Helices bend at conserved 
Pro 60 and Pro 206 residues (spheres). d, Locked (left) and unlocked (right) 
mutant protomers viewed from the cytoplasm and shown in surface 
representation. 
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Extended Data Figure 8 | Arg395 adapts to its environment. a, The arginine 
side chain (Arg 276 in the wild type; Arg 395 in H27¢,395-Gltpy) is seen in 
molecular dynamics simulations to engage in hydrogen-bonding interactions. 
The extent of the hydrogen bonds formation is shown as a function of 
simulation time in Charmm Trajectory 3 (see Extended Data Fig. 10). The main 
interactions of the arginine in both mutant and wild type are with water 
molecules, but the locations of the waters are very different. In H276,395-Gltp),; 
the Arg 395 side chain is located 5 to 9 A below the level of the membrane 
surface, so that the water molecules are those penetrating the membrane- 
protein interface due to remodelling of the membrane. In the wild type, the 
water molecules interacting with Arg 276 are in the space created inside the 
protein. b, The minimum distance from wild-type Met395 (top) or mutant 
Arg 395 (bottom) side chains to any lipid phosphate group (left) or any water 
molecule (right) in Charmm Trajectory 3. In H276,395-Gltpp, after the initial 
equilibration phase, lipid phosphate groups interact with Arg 395 either 
directly (5 A distance) or through water (7.5 A distance). In the wild type, lipid 
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head groups remain far from the hydrophobic Met 395 side chain. Water 
interacts constantly with Arg 395, but only occasionally with Met 395 (in 
protomer B, a water molecule approaches Met 395 from the inside of the 
protein, at the interface between transport and trimerization domains). c, The 
same set of distances as in b for the mutant, from a different trajectory (G54a7 
Trajectory 2) obtained independently, using a different force field. The 

same trends are observed as in b, showing proximity to the polar environment. 
d, Membrane bending (blue indicates thinning, red indicates thickening) close 
to Arg 395 (green) which exposes its side chain to a polar environment 
comprised of water molecules and lipid head groups. e, Root mean square 
deviation (r.m.s.d.) of the Arg 395 side chain with respect to the crystal 
structure after alignment on the trimerization domain, calculated from 
Charmm Trajectory 3 and G54a7 Trajectory 2. The side chain initially samples 
different conformations before settling into the membrane-exposed position 
shown in panel d. 
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Extended Data Figure 9 | Lipids or detergent molecules stabilize the 
unlocked conformation of H27¢,395-Gltp,. a—e, Centre-of-mass distance 
between the transport and scaffold domains of protomers A, B, and C of 
Ho76,395-Gltp), as a function of molecular dynamics simulation time. The data 
are from five independent simulations initiated with position restraints on 
the Ca atoms (later released at different time points) and with the domain 
interface solvated with water. The vertical green lines indicate the moment in 
the corresponding trajectory when position restraints were turned off. Panels 
a and b show two repeats of the same starting structure simulated with the 
Charmm force field* and panel ¢ with Gromos force field**. The transport 
domains in protomers B and C collapse onto the trimerization domain rapidly 


and lose their ligands in some cases (red arrows). d, A simulation, in which lipid 
tails partially insert into the interface spontaneously; the unlocked structure is 
stable much longer (note the different time scales on the time axis), and the 
collapse is only partial. e, The trajectory of a NAMD simulation (Charmm force 
field) in which lipid molecules were docked into the interface of protomers 

B and C at the time marked by the red arrow (3 lipids per protomer). The lipids 
remained in the docked region for the entire duration of the simulation and 
stabilized the position of the transport domain. f, g, The best scored docking 
poses for a detergent molecule and a POPC lipid, respectively, docked at the 
interface of protomer C. 
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Extended Data Figure 10 | Simulated smFRET data recapitulate 
experimental observations. a—d, Simulated FRET efficiency population 
contour plots (left side of each panel) and cumulative population histograms 
(right side) for wild-type Gltp, (a) and H76,395-Gltp;, (b), and the 
corresponding transition density plots (c and d), (see Fig. 2 for corresponding 
experimental data). As noted before’, there are fewer transitions observed 
between the low- and high-FRET states in the wild-type transporter than would 
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be expected from the model. This may either be because the model does not 
recapitulate the noise correctly or it may reflect previously uncharacterized 
communication between the protomers that warrants further investigation. 
e, f, Dwell time distributions for the low- (left panel) and intermediate- and 
high-FRET states (right panels) obtained for wild-type Gltp), (e) and 
Ho76,395-Gltpy, (f) (see Fig. 4 for corresponding experimental data). 
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Extended Data Table 1 | FRET state assignments and populations; time constants for the slow and fast components 


a, FRET State Population Distributions in proteoliposomes 


WT Gltpp, 
FRET Subunit Apo, P(out)=0.55, Transport, P(out)=0.65, 
configuration % % % % 
Low OF/OF+OF/IF 54 55 63 65 
Intermediate IF/OF 27 25 22 22 
High IF/IF 19 20 15 13 
R276S/M395R Gltpp, 
FRET Subunit configuration Ap m a FARSDOI: 
% % 
Lower OF/OF+OF/IF 40 55 
Higher IF/OF+IF/IF 60 45 


b, FRET State Population Distributions in detergent micelles 


WT Gite 
FRET Subunit Apo, P(out)=0.45, Bound, P(out)=0.5, 
configuration % % % % 
Low OF/OF+OF/IF 46 45 49 50 
Intermediate IF/OF 24 25 25 25 
High IFAIF 29 30 24 25 
R276S/M395R Gitp, 
FRET Subunit configuration - e, alia ore, 
‘0 () 
Lower OF/OF+OF/IF 62 30 
Higher IF/OF+IF/AF 38 70 


c, Time constant for stable (slow) and transient (fast) FRET States in detergent micelles 


WT Low FRET Intermediate / High FRET 
trast, S tsiow, S trast, S tsiow, S 
Apo ~0.6 ~6 ~0.6 ~5 
Na’, aspartate ~0.7 ~12 ~0.7 ~15 
R276S/M395R Low FRET Higher FRET 
t,s ts 
Apo ~1.5 ~1.1 
Na’, aspartate ~1.7 ~76 


a, b, Shown are the assignments of FRET states to configurations of labelled subunit pairs and corresponding observed populations, rounded to integer numbers. Also shown are the calculated populations 
considering the probability of a protomer to be in the outward facing state P(out) and assuming independent protomers in the trimer. c, Time constants for the wild-type transporter, t, of the slow and fast 
components were derived from fitting the survival data compiled from the measured dwell times to double exponential function. The time constants for the H276,395-Gltp, mutant were obtained by fitting the 
survival data to a single exponential function. Shown are averages from three independent experiments. The standard errors are within 5%. Dwell times longer than 10s are significantly underestimated because 
photobleaching, which occurs with time constant of ~40s, is limiting the observation window. 
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Quasars have long been known to be variable sources at all wave- 
lengths. Their optical variability is stochastic and can be due to a 
variety of physical mechanisms; it is also well-described statistically 
in terms of a damped random walk model’. The recent availability 
of large collections of astronomical time series of flux measurements 
(light curves”~*) offers new data sets for a systematic exploration of 
quasar variability. Here we report the detection ofa strong, smooth 
periodic signal in the optical variability of the quasar PG 1302—102 
with a mean observed period of 1,884 + 88 days. It was identified 
in a search for periodic variability in a data set of light curves for 
247,000 known, spectroscopically confirmed quasars with a temporal 
baseline of about 9 years. Although the interpretation of this phe- 
nomenon is still uncertain, the most plausible mechanisms involve 
a binary system of two supermassive black holes with a subparsec 
separation. Such systems are an expected consequence of galaxy mer- 
gers and can provide important constraints on models of galaxy for- 
mation and evolution. 

Subparsec supermassive black-hole (SMBH) binary systems are not 
resolvable except possibly with long baseline radio interferometry. An 
alternative approach to their detection is through a modulated variability— 
caused by, for example, perturbations in their accretion disks or pre- 
cession of relativistic jets, if they are present (Fig. 1). The best known 
candidate, OJ 287°, has shown a pair of outburst peaks every 12.2 years 
for at least the past century: this object can be interpreted as a second- 
ary SMBH perturbing the accretion disk of the primary SMBH at reg- 
ular intervals’. Systematic searches for equivalent systems to date*"'° have 
attempted to identify them from broad-line velocity offsets in their op- 
tical and near-infrared spectra, but cannot detect the closest pairs (with 
a separation of < 0.1 pc). 

We applied a novel joint wavelet and autocorrelation function (ACF) 
based technique that identifies objects exhibiting strongly periodic be- 
haviour in their light curves (M.J.G. et al., manuscript in preparation) 
to the largest set of quasar time series currently available. These are drawn 
from the Catalina Real-time Transient Survey (CRTS)'***. PG 1302— 
102 (Fig. 2) is the strongest periodic candidate out of 20 objects meet- 
ing the selection criteria: strong constant wavelet peak, strong ACF 
detection of periodic behaviour, sufficient temporal coverage for 1.5 or 
greater cycles at the detected period, and a phased light curve well- 
described by a sinusoid. For statistical comparison, we have also gener- 
ated a simulated light curve for each known quasar based on a damped 
random walk (DRW) model, a standard statistical description of the op- 
tical variability of quasars’, using the CRTS time sampling. We find that 
only one object from the simulated data of 247,000 quasars satisfies the 
same selection criteria, showing that the number of quasars selected is 
statistically significant and that strongly periodic behaviour is not ex- 
pected as an artefact of a DRW process. 

PG 1302-102 has a median V-band magnitude of 15.0 anda redshift 
of 0.2784, which gives an absolute V-band magnitude of My = —25.81, 
assuming the 9-year WMAP cosmology™. It is outside the footprint of 


the Sloan Digital Sky Survey but is associated with bright infrared and 
X-ray sources. It is also a very bright (720 mJy at 4.86 GHz), core- 
dominated flat spectrum radio source. Its optical/near-infrared spec- 
trum (Fig. 3) shows broad emission lines (HB, Ha, PaB, Pax) with an 
inferred mass of log(M/M 3) = 8.3-9.4 and the object appears to be ra- 
diating at or close to its theoretical Eddington limit (log(L/Lgaa) = 0). 
The light curve for the quasar is well-fitted by a sinusoid with an ob- 
served period of 1,884 + 88 days (corresponding to a rest-frame period 
of 1,474 + 69 days) and an amplitude of ~0.14 mag. CRTS data (cover- 
ing ~1.8 cycles; that is, May 2005 to the present day) are augmented by 
archival monitoring data'*’* available back to May 1993, giving a total 
coverage of 4.1 cycles. These data are consistent with the behaviour seen 
in the past nine years of CRTS data, particularly with stochastic pho- 
tometric variation imposed on a periodic signal. Further simulations 
show that the detection is statistically significant, with an observed sig- 
nal 40 times the scatter from the mean. 

As PG 1302-102 is bright and nearby, it has featured in a number of 
studies of quasars and their host galaxies. The radio and optical struc- 
ture of the source is noted to be unusual. Hubble Space Telescope (HST) 
imaging” shows that the quasar resides in a luminous elliptical host, as 
typical for radio-loud quasars’®. There are also two companion galaxies 
that lie at projected distances of 3 and 6 kpc. Several features in radio 


log, [Orbital period (yr)] 


log,,{Total black hole mass (Mo)] 


Figure 1 | The parameter space of SMBH binary pairs. The expected orbital 
periods for SMBH close binary pairs at the specified separations as a function 
of total black-hole mass. The solid upper line for each separation indicates a 
z=5 track and the solid lower line a z = 0.05 track, while the two internal 
dotted lines show z= 1.0 (lower) and z = 2.0 (upper) tracks, respectively. 
The hatched region indicates the range over which CRTS has temporal 
coverage of 1.5 cycles or more of a periodic signal. The pink shaded region 
shows the region of detection for the best CRTS candidate given the range 

of virial black-hole masses reported in the literature. Also shown (solid black 
star) is the location of the best known SMBH binary candidate, OJ 287°. 
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Figure 2 | The composite light 
curve for PG 1302—102 over a 
period of 7,338 days (~20 years). 
The light curve combines data from 
two CRTS telescopes (CSS and 
MLS) with historical data from the 
LINEAR and ASAS surveys, and 
the literature’*’* (see Methods for 
details). The error bars represent one 
standard deviation errors on the 
photometry values. The red dashed 
line indicates a sinusoid with period 
1,884 days and amplitude 0.14 mag. 
The uncertainty in the measured 
period is 88 days. Note that this 
does not reflect the expected shape 
of the periodic waveform, which 
will depend on the physical 
properties of the system. 

MJD, modified Julian day. 
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images of PG 1302-102, such as the small radio core and sharp bends 
in the radio structure very close to the central source, correspond with 
features seen in the optical’’. An interpretation is that the host galaxy is 
a fairly old merger, but that there might be more recent activity, with 
the radio source just turning on and possible radio jets just emerging 
from the host galaxy. There may also be some indication of relativistic 
beaming connected with a jet. It should be noted that OJ 287 also ex- 
hibits a similar radio and optical morphology”’. 

PG 1302— 102 was spectroscopically monitored over a six-month pe- 
riod in 1990" and showed no detectable (greater than 5-10%) change in 
any component of its spectrum over that time. This lack of variation is 
not inconsistent with the ~60 month period that we have identified. A 
SMBH binary may also exhibit double-peak broad line profiles in its spec- 
trum for a small window of separation between the pair” (although disk 
emission from an accretion disk around a single source may also produce 
the same effect”’). At closer distances, the two black holes dynamically 
affect the broad-line region clouds as a single complex entity producing 
single-peaked spectral lines with asymmetric line profiles. The Balmer 
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Figure 3 | The composite spectrum for PG 1302—102. This combines an 
archival GALEX spectrum (ultraviolet) with optical/near-infrared spectra 
taken with the Keck and Palomar 200 inch telescopes in April and June 2014. 
F,, flux density. The prominent emission lines are indicated. The median 
flux errors are 5.6 X 10 '° erg s ‘cm * for the GALEX data (4 < 0.3 jim), 
45X10 '° and 7.6 X 10 '’erg s 'cm?, respectively, for the blue 

(0.3 pm < 2<0.5 pm) and red (0.5 um < 2 <0.9 um) optical spectra from 
Palomar, and 4.6 X 10° !8 erg s 'cm ? for the Keck near-infrared 

(A> 0.9 |tm) spectrum. 


and Paschen series spectral lines in PG 1302—102 do not show a dou- 
ble peak profile but are consistently asymmetric (Fig. 4). In particular, 
a small bump on the red wing of HB has been reported”’, implying a 
velocity shift of the order of 200 km s~' between the narrow and broad 
components of HB. One proposed explanation for this is a binary system. 
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Figure 4 | The profiles of the Balmer and Paschen series lines of PG 
1302—102. The data have been modelled with a multi-component line fitting 
technique (see Methods for details). a, b, Balmer Hf (a) and Ho (b) have been 
fitted using a narrow component (dashed blue line) and a broad Gaussian 
(solid orange line). The dashed green line shows the linear continuum 
component, and the total fitted profile is shown as a solid red line. HB requires a 
single Gaussian offset from the narrow component but Ha requires two 
components—a central Gaussian plus a red wing. c, d, The Paschen lines (PaB 
(c) and Pax (d)) also show a consistent small asymmetry on the red side. 
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The physical interpretation of the periodicity is uncertain, although 
its sinusoidal nature suggests that it is kinematic in origin: we consider 
three possibilities here. (1) The optical flux could be the superposition 
of thermal emission from the accretion disk and a non-thermal contri- 
bution from a precessing jet, and such a model can fit the observed data 
(see Methods). The expected precession period with a single SMBH is 
about 107°-10°° years”, much longer than the observed period. Thus, 
a binary SMBH origin for the jet precession is more plausible. In this 
latter case, the jet could precess for two reasons: either as a result of inner 
disk precession due to the tidal interaction of an inclined secondary 
SMBH, or because of the precession of a circumbinary disk warped by 
the SMBH binary. (2) Another possibility is a temporary hotspot in the 
inner region of an accretion disk, but this leads to implausible single 
SMBH mass estimates of log(M/M 5) = 11.4-12.2, depending on the 
degree of rotation of the SMBH (the largest reported SMBH masses” 
are of the order of log(M/M 5) ~ 10). However, with a SMBH binary, 
periodic mass accretion rates can give rise to an overdense lump in the 
inner circumbinary accretion disk**. The spectral energy distribution 
of a circumbinary disk also has a steeper power law” and so accretion 
variations will have a more noticeable effect at shorter wavelengths. 
(3) Yet another possibility is a warped disk eclipsing part of the con- 
tinuum as it precesses, although SMBH binaries are proposed as a pos- 
sible cause for such warped disks**. We note as well that light curves for 
objects known to exhibit these phenomena do not resemble that of PG 
1302—102 (see Extended Data). 

If PG 1302—102 were to be described as a binary SMBH pair with a 
total virial mass of log(M/M >) ~ 8.5, the observed period gives an upper- 
limit separation of ~0.01 pc between the pair. This would mean that 
the system has evolved well into the ‘final parsec’ scale. The expecta- 
tion is that most binary SMBH systems will spend the majority of their 
lifetime at such separations (0.01-1 pc), in an intermediate phase of 
evolution between scattering any stars in the nuclear region and grav- 
itational radiation dominance”. 

Further observations could test the different interpretations men- 
tioned above, particularly reverberation mapping to measure the beha- 
viour of emission line response to continuum variations, which is expected 
to be different for different explanations”. Continued monitoring by 
CRTS and other synoptic surveys will track future cycles, and historical 
photometric data from photographic plate collections may provide more 
data for previous ones. With decadal baselines, the predicted change in 
period of the system may be detectable. Future spectroscopic observa- 
tions could also test whether the line asymmetries vary on binary orbital 
timescales. Multiwavelength observations should provide more infor- 
mation about the innermost regions of the quasar and the nature of the 
jet. The relationship between PG 1302—102 and its two nearby com- 
panions may also furnish insight into the merger history of this source, 
particularly as these may contain similarly sized SMBHs. Finally, if PG 
1302—102 is a SMBH binary, it is a strong candidate for any gravita- 
tional wave experiment sensitive to nanohertz frequency waves, such 
as those using pulsar timing arrays and any future space-borne grav- 
itational wave detection mission. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

CRTS data. The Catalina Real-time Transient Survey (CRTS) makes use of the Cat- 
alina Sky Survey which began in 2004, operated by the Lunar and Planetary Labo- 
ratory at the University of Arizona, and uses three telescopes (designated CSS, MLS 
and SSS) to cover the sky between declination 6 = —75° and +65° (~80% of the 
sky) in order to discover near-Earth objects and potentially hazardous asteroids. 
The full Catalina surveys data set contains time series for approximately 500 mil- 
lion sources to a limiting magnitude of V ~ 20 with an average of ~250 observa- 
tions over a baseline of 9 years per source. CRTS operates an open data policy, and 
the data are publicly available at: http://catalinadata.org. 

For photometric calibration we combine observations taken with all three Cat- 
alina Sky Survey telescopes since no difference was found between these systems. 
This is not surprising since each telescope specifically uses the same type of 4k X 4k 
CCD camera and observations with all three telescopes are calibrated using the 
same software pipeline. All Catalina observations are transformed to Johnson V 
based on 50-100 stars selected as G-type stars using 2MASS” colours. For bright 
stars, this photometry provides repeated photometry accurate to ~0.05*’. However, 
as the photometry is unfiltered, there are significant variations with object colour. 

The 2007 Landolt UBVRI standard star catalogue” provides 109 stars centred 
near declination —50° in the magnitude range 10.4 << V< 15.5 and in the colour 
index range —0.33 < (B — V) < 1.66, while the 2009 Landolt catalogue** provides 
202 standard stars along the celestial equator in the magnitude range 8.90 << V< 
16.30, and the colour index range — 0.35 < (B — V) < 2.30, along with 393 stand- 
ard stars from previous standard star catalogues. In total there are 445 Catalina light 
curves matching Landolt standards. On average each standard is measured 134 times. 
A handful of stars that appeared to exhibit significant variability were removed. 

Median magnitudes were calculated for each light curve and these were then 
used to determine the following transformation equations between Johnson V and 
Catalina Voss: 


V =Vess + 0.31 x (B—V)* +0.04 


V=Vegs +0.91 x (V—R)* +0.04 


V=Vess + 1.07 x (V—1)* +0.04 


The dispersion in the fits to these transformations are 0.059, 0.056 and 
0.063 mag, respectively, for V< 16. 
Candidate selection. We applied the weighted wavelet transform™ and the 
z-transform discrete correlation function (ZDCF)’* to the CRTS light curves of 
spectroscopically confirmed quasars. Both of these algorithms can detect (quasi-) 
periodic behaviour in irregularly sampled data. We define the period of the quasar 
from the largest peak in the ZDCF between the second and third zero-crossings of 
a Gaussian process model fit to the ZDCF”*. We have verified that this agrees with 
periods determined by other methods. The period uncertainty is defined as: 


1.483 x MAD 
0) i —_—— 
N-1 


where MAD is the median of the absolute deviations from the median of the time 
intervals between successive peaks in the ZDCF and Nis the total number of peaks 
considered. 

We only considered those objects whose wavelet peak significance places them 

in the top quartile of the data set (in terms of significance). Most kinematically- 
caused variations should manifest as a (near)-Keplerian signal and so we also use the 
r.m.s. scatter around an expected sinusoidal waveform (best-fit truncated Fourier 
series of up to 6 terms) with the ZDCF period to identify those objects most closely 
exhibiting the expected behaviour. We excluded those quasars where the scatter is 
greater than the 1¢ lower limit on the median absolute deviation of the light curve, 
that is, r.m.s./MAD > 0.67, since we need to account for the intrinsic variability of 
the quasar as well. We also restricted our selection to candidates with temporal 
coverage of more than 1.5 cycles, assuming the ZDCF period. 
Simulated data. For each quasar in our data set, we have generated a simulated light 
curve assuming that it follows a damped random walk (DRW) model. Using the 
actual observation times, t;, we replace the observed magnitudes with those that 
would be expected under a DRW model. The magnitude X(t) at a given timestep At 
from a previous value X(t — At) is drawn from a Gaussian distribution with mean 
and variance’: 


E(X(t)|X(t—At)) =e~4"/"*xX(t—At) +bt(1—e7 44") 


Var(X(t)|X(t—At)) = ie [1 ae il] 
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We add a Gaussian deviate normalized by the photometric error associated with 
the magnitude to be replaced at each time f to incorporate measurement uncer- 
tainties into the mock light curves. For each light curve, we set bt to its median value 
and use the rest frame DRW fitting functions”: 


ARE Meu 
lo A+Blo = }+C(M+23)+Dlo (Fe) 
ef (ea) ( ) 5\ 10M 


where (A, B, C, D) = (—0.51, —0.479, 0.113, 0.18) for f= SF.. = to’/2 and (A, B, C, 
D) = (2.4, 0.17, 0.03, 0.21) for f= t. M is the absolute magnitude of the quasar and 
Ape is the rest-frame wavelength of the filter. The mass of the black hole is either 
the measured virial mass** or is drawn from a Gaussian distribution’: 


(log Ms =) 


p(log Mpu|M) Ag? 


1 
= 
where jt = 2.0 — 0.27M and o = 0.58 + 0.011M. 
Statistical significance. To assess the statistical significance of the detection of PG 
1302—102, we generated 1,000 simulated light curves as above. From these we de- 
termined the mean weighted wavelet power spectrum as a function of time and fre- 
quency and its variance. The dominant signal in the observed WWZ spectrum of 
PG 1302-102 is then seen to be ~40 times above the corresponding mean DRW 
value in terms of the expected standard deviation. 

We have also performed a periodicity analysis of the light curve of PG 1302— 
102 using the generalized Lomb-Scargle method which shows a statistically sig- 
nificant peak at the same period identified by the wavelet and autocorrelation 
analyses. The false alarm probability is <10- '* — the 10” "* level is P() = 0.335 
and the observed peak is at P(w) = 0.818. 

Theoretical predictions. Simple disk models for circumbinary gas and the binary- 
disk interaction have been used” to consider the number of SMBH binaries expected 
ina variety of surveys, assuming that such objects are in the final gravitational-wave- 
dominated phase of coalescence (this equates to separations less than ~0.01 pc for 
a 10°Mo SMBH binary). This approach has been combined*° with merger tree 
assembly models to similarly predict the number of expected SMBH binaries at 
wider separations where spectral line shifts may be seen (this equates to separations 
greater than ~0.2 pc for a 10°M. SMBH binary). The latter shows that ina sample 
of 10,000 quasars at z < 0.7, there should be ~ 10 objects and this number increases 
bya factor of ~5-10 for z< 1. We note, however, that these theoretical arguments 
are still subject to considerable uncertainties; for example, if the final parsec prob- 
lem cannot be resolved then there will not be any binaries in the ~0.01 pc regime. 

Assuming a limiting magnitude of V ~ 20, a detectable range of orbital periods 
from 20-300 weeks (spanning both GW- and gas-dominated regimes), a survey 
sky coverage of 27 steradians, and a redshift range of 0.5-4.5, we would expect 450 
SMBH binaries following these approaches. Our finding of 20 candidates from a 
sample of 240,000 quasars is therefore conservative. 89,000 quasars in our sample 
also have virial black-hole mass estimates** (23% at z > 2) and if we assumed that 
each of these was a SMBH binary with a separation of 0.01 pc then the CRTS tem- 
poral baseline is sufficient to detect 1.5 cycles or more in 63% of them (including 
55% of the z > 2 population). Our search is therefore sensitive to a large fraction of 
the close SMBH binary population. 

We note that our approach assumes that periodicity associated with SMBH 
binaries manifests in a Keplerian form. If there is a larger set of non-Keplerian 
periodic SMBH binaries, either flaring, such as OJ 287, or not, then the 20 objects we 
have identified may be a small sample of the total close binary SMBH population. 
Archival data. LINEAR data’. These were calibrated with pre-release photometry 
from Pan-STARRS using the g, r and i bandpasses. Comparison stars with instru- 
mental magnitudes (ccd_mag) between 14 and 17 were selected within 0.1° of PG 
1302—102 in LINEAR images. g — i colours were used to compute an r-band cor- 
rection so that a calibration star with g— i= 0 has an instrumental magnitude, 
ccd_mag = r. Zeropoints for each frame were then derived based on these stars. 
The reported bandpass for the calibrated magnitudes is therefore approximately r. 
Magnitude errors are computed by SEXTRACTOR, with typical r.m.s. errors be- 
tween frames of ~0.1. 

ASAS data. The nominal limiting magnitude for ASAS’ is I ~ 13 and so PG 1302— 
102 is very close to the detection threshold. The low signal-to-noise ratio for such 
an object is the primary cause of the large degree of scatter seen in ASAS data for 
this source. 

Historic data. Such data for PG 1302 from previous quasar monitoring campaigns 
is available in the literature’*’*. To put all data on the same photometric scale, off- 
sets were applied to account for differences in the photometric systems used. Region 
of temporal overlap between a pair of data sets were used to derive offsets so that 
both data had the same median value. Where no temporal overlap exists, the phased 
light curve was used to determine the median offset. 

Earlier individual photometric observations also exist of PG 1302—102 but the ob- 
servational errors on these are typically ~0.1 mag and so it is difficult to determine 
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whether they agree with the extrapolated behaviour. They also tend to be in dif- 
ferent passbands which requires colour terms to convert to the V-passband to which 
CRTS is calibrated. However, colour terms for quasars are known to vary (“bluer 
when brighter”) so a constant value cannot be assumed (the quoted (B — V) values 
for PG 1302-02 have a range of at least 0.2 mag), which introduces an additional 
error to the transformed magnitude. Such historical data are thus of limited utility. 
Spectroscopic data. An optical spectrum was obtained using the Double Spectro- 
graph on the Hale 200-inch telescope at Palomar Observatory on UT 2014 April 
22. We obtained two 250 s exposures in cloudy conditions using the 1.0” wide slit, 
the 5,500 A dichroic, the 600 2 mm7! grating on the blue arm (Apjaze = 4,000 A), 
and the 316 24mm ! grating on the red arm (Aplaze = 7,500 A). 

On UT 2014 May 26, we obtained additional spectroscopy of PG 1302—102 using 
the Low Resolution Imaging Spectrometer (LRIS*') on the Keck I telescope. We 
obtained two 300-s exposures in non-photometric conditions, using the 1.5” slit, 
the 5,600 A dichroic, the 600 2 mm! grism on the blue arm (Aptaze = 4,000 A) and 
the 400 2 mm! grating on the red arm (Ablaze = 8,500 A). 

These set-ups provided moderate resolution spectra across the entire optical win- 

dow, 3,100 A to 1 um. The data from both telescopes were reduced using standard 
procedures and calibrated using observations of standard stars obtained on the 
same (non-photometric) nights. 
Near-infrared. We obtained a near-infrared spectrum of PKS 1302—102 with the 
TripleSpec instrument on the Hale 200-inch telescope at Palomar Observatory on 
UT 2014 April 15. Conditions were clear and the seeing was ~ 1 arcsec. The source 
was observed at an airmass of 1.3890 and was observed for four 300-s exposures in 
an ABBA dither pattern for a total of 20 min of on-source exposure. A spectrum of 
telluric standard AOV star HD 112304 was obtained immediately following the 
source spectrum at an airmass difference of 0.15. We reduced the data using a 
modified version of the Spextool data reduction package” which also performs a 
telluric correction using the standard star spectrum’. 

Figure 3 shows the combined optical and near-infrared spectrum, in which we 
scaled the (non-photometric) optical spectrum by a factor of 2.86 to meet the near- 
infrared spectrum in their mutually overlapping wavelength region (0.96-1.05 jum). 
The change in flux is probably primarily due to a combination of weather variations 
and slit losses. The Balmer lines, HB and Ha, and Paschen lines, PaB and Paw, are 
marked with vertical dashed lines. 

System parameters. To estimate the (total) black-hole mass for this source, we 
used the standard method of single epoch virial black-hole estimation“ and adapted 
relations derived for Paschen lines in the near-infrared**. To determine the full- 
width at half-maximum (FWHM) of the broad line component of the four lines 
marked in Fig. 4, we applied a multi-component line fitting technique**” i 


in which 
we model the narrow-line component of the line profiles by first fitting the [O 11] 
4,959, 5,007 A lines and fixing the width of a narrow line component in each pro- 
file. For Ha we also include the [N 1] 6,548, 6,583 A doublet, fixed at a flux ratio of 
2.96. The broad component can be modelled by up to three Gaussians. If the ratio 
of ¢ values for successive fits is greater than 0.8, an additional component is added. 
The near-infrared spectrum has an error array which is used in the line modelling 
and parameter estimation (see below). For the optical spectrum, an error array is 
estimated from the median difference between adjacent pixels (R. White, personal 
communication). 

The measured values from the 2014 data are: 4,450 + 150kms_‘ for HB, 2,520 + 
30kms ' for Pax, and 3,200 + 20kms ' for Paf. Errors on the FWHM were 
computed using a Monte Carlo approach: the best-fit model for a line was perturbed 
with a random draw from the error array at each wavelength element and a new fit 
made. This was repeated 100 times for each line and the standard deviation of the 
broad line component FWHM used as the error. These values give a (total) mass 
for the SMBH of log(M/M.) = 8.8+0.6 (Hf), 8.5 + 0.1 (PaB), and 8.4 + 0.1 (Paz). 
Previously reported values of the FWHM of Hf give a range of estimates for the 
SMBH mass in the literature using various techniques of log(M/M.) = 8.3-9.4, so 
our results are consistent with these. 

We note that the spectral fits of HB and Pa are not perfect. However, the un- 

certainties in the FWHM and continuum luminosity that this introduces are small 
compared to the broad range of SMBH mass estimated from the different lines. 
Alternative interpretations. We present further discussion here on the alterna- 
tive interpretations considered. We note that none of the objects mentioned pass 
our candidate selection criteria. 
Jet related. The optical flux could be the superposition of thermal contribution from 
the accretion disk with non-thermal contribution produced by the underlying jet. 
The flux density of the jet (in the optically thin regime) will be boosted in the 
observer’s frame relative to the co-moving frame: 


Si(v) =S;(v) 6(b.)? 


where «is the spectral index (S(v) a v ”) and p = 2 fora continuous jet. The jet is 
precessing with constant angular velocity «, has an opening angle Q and an axis 


defined by the angles # (between the jet axis and the line of sight) and mg (the 
position angle in the plane of the sky): 


sin? ¢ =(sin Q cos wt + cos Q sin py sino)” 
+ (sin Q cos bg sin wt + cos Q sin py cos Ny)” 


Assuming a constant Lorentz factor y for the relativistic bulk motion of the jet, 
y =(1— B’) *”, the Doppler factor is given by:6 = y_'(1 — Bcosd)~*. Modelling 
the light curve in this way, we get best-fit parameters of: y = 5.4 + 0.1, Q =0.5° 
+ 0.1, ¢o = 5.0° 0.2, and no = 0.6°+ 1.4 (assuming « = 1.66). 

A number of radio-loud quasars have been reported**-*° as showing periodic 
variability in their radio light curves. While a SMBH binary could explain this, a 
more likely explanation is shock interaction with a helical jet or precession of a jet. 
However, the optical light curves of these objects (see Extended Data Fig. 1) do not 
show the distinctive behaviour seen in that of PG 1302—102 suggesting that a dif- 
ferent physical mechanism is more likely. We note as well that of the 20 objects in 
our full sample showing optical periodicity, only 3 are associated with a radio source. 
Warped accretion disks. These have been observed in a handful of AGN*'* and 
the suggestion here is that as a warp precesses, it could obscure a small amount of 
continuum emission which would then appear quite regular. Again there is no indi- 
cation of any periodic behaviour in the CRTS light curves available for known objects 
with warped disks (see Extended Data Fig. 2) similar to that seen in PG 1302—102. 

PG 1302-102 shows a 14% variation in flux, which would suggest that the size 
of the warp in the disk is quite large. This would also be an orientation-dependent 
phenomenon andas the source isa blazar, its accretion disk should be oriented close 
to face-on to us and so any obscuring factor should be limited in effect. We also 
note that many stellar systems with warped accretion disks are resolvable binary 
systems. 
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Extended Data Figure 1 | The optical light curves of quasars showing radio —_1.5 mag from each other. The data are split across two panels for ease of 
periodicity. Shown are the CRTS light curves for 11 quasars reported*””° viewing. Error bars shown are standard 1¢ photometric errors. The CRTS light 
to show periodicity in their radio emission. Each light curve has been curve of PG 1302—102 (solid black stars) is also shown for comparison. 
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Extended Data Figure 2 | The optical light curves of quasars with 0.5 mag from each other. Error bars shown are standard 1o photometric 
warped accretion disks. Shown are the CRTS light curves for 6 quasars errors. The CRTS light curve of PG1302—102 (solid black stars) is also shown 
reported’*> to have warped accretion disks. Each light curve has been for comparison. 


normalized to zero mean and individual curves are offset by a constant of 


©2015 Macmillan Publishers Limited. All rights reserved 


Mae Ae dL Tea 


doi:10.1038/nature14144 


Brittle intermetallic compound makes ultrastrong 
low-density steel with large ductility 


Sang-Heon Kim', Hansoo Kim!’ & Nack J. Kim! 


Although steel has been the workhorse of the automotive industry 
since the 1920s, the share by weight of steel and iron in an average 
light vehicle is now gradually decreasing, from 68.1 per cent in 1995 
to 60.1 per cent in 2011 (refs 1, 2). This has been driven by the low 
strength-to-weight ratio (specific strength) of iron and steel, and the 
desire to improve such mechanical properties with other materials. 
Recently, high-aluminium low-density steels have been actively studied 
as a means of increasing the specific strength of an alloy by reducing 
its density’ >. But with increasing aluminium content a problem is 
encountered: brittle intermetallic compounds can form in the result- 
ing alloys, leading to poor ductility. Here we show that an FeAl-type 
brittle but hard intermetallic compound (B2) can be effectively used 
as a strengthening second phase in high-aluminium low-density steel, 
while alleviating its harmful effect on ductility by controlling its 
morphology and dispersion. The specific tensile strength and ductil- 
ity of the developed steel improve on those of the lightest and stron- 
gest metallic materials known, titanium alloys. We found that alloying 
of nickel catalyses the precipitation of nanometre-sized B2 particles 
in the face-centred cubic matrix of high-aluminium low-density steel 
during heat treatment of cold-rolled sheet steel. Our results dem- 
onstrate how intermetallic compounds can be harnessed in the alloy 
design of lightweight steels for structural applications and others. 

There is increasing demand for a broad range of structural materials 
for environmentally benign, energy-efficient, lightweight engineering 
systems. The balance of lightness, strength and ductility in metallic alloys 
has been explored since the Bronze Age. Unfortunately, strength and 
ductility are mutually exclusive. Further, higher specific strength is ex- 
tremely difficult to achieve because strength invariably decreases as den- 
sity is reduced. However, the alloying of iron with aluminium results in 
higher specific strength even though density is reduced. Until now, 
low-density steel has been studied*’ mostly in systems based on Fe-Al 
and Fe-Al-Mn-C. In particular, low-density steel with fairly high spe- 
cific strength has been produced with alloys based on Fe-Al-Mn-C (the 
so-called TRIPLEX steels) using a microstructure consisting of auste- 
nite (face-centred cubic) matrix and finely dispersed nanometre-sized 
«-carbides of the (Fe,Mn),AIC type**”. However, the level of specific 
strength attainable by this microstructure was not comparable to those 
of light materials such as aluminium and titanium alloys. This was due 
to the low strain hardening rate of the Fe-Al-Mn-C alloys containing 
«-carbides, which are easily shearable by gliding dislocations’. 

One of the general concepts employed until now in the alloy design 
of Fe-Al-Mn-C-based, high-aluminium, low-density steel has been the 
suppression of ‘brittle’ intermetallic compound formation by stabilizing 
the ‘ductile’ austenite matrix*** (this stabilization is achieved by alloying 
carbon and manganese). Instead, here we have actively utilized the brittle 
intermetallic compound B2 by modifying its morphology in the steel 
matrix. Despite their poor plasticity at ambient temperature in the bulk 
state, FeAl-based intermetallic compounds offer an attractive combi- 
nation of physical and mechanical properties such as low density and 
good corrosion, oxidation and/or wear resistance’ '*. To take advant- 
age of B2, we devised an alloy design in which B2 is dispersed as a sec- 
ond phase in the austenite matrix on the basis of the ‘divide and rule’ 


principle, which is analogous to harnessing ‘brittle’ martensite as a 
strengthening second phase in the ferrite (body-centred cubic) matrix 
of dual-phase steels'*”’. 

A common method of uniformly distributing fine particles in a ma- 
trix is to make the best use of highly potent nucleation sites for inducing 
the precipitation of the particles. In this study, potential nucleation sites 
for B2 during annealing of wrought sheet steel include (1) grain bound- 
aries or edges of recrystallized austenite crystals and (2) deformation 
shear bands, which are common in hot- or cold-worked low-density 
steel*°”'®'”, To expand the stability domain of B2 above the recrystal- 
lization temperature (normally, 800-900 °C) of deformed austenite, the 
alloying recipe of an austenitic low-density steel was modified by add- 
ing 5 weight per cent nickel (Ni), which is one of the most effective 
elements for forming B2 with aluminium’*””. The addition of Ni to low- 
density steel may appear to conflict with the collective wisdom of ferrous 
alloy design; Ni has been regarded merely as a well-known austenite 
stabilizer like Mn and C; and Ni has been little noticed in low-density 
steel design, mainly because it is not a critical determinant of the den- 
sity in ferrous alloys. 

The steel under investigation here was produced using an induction 
melting furnace. About 40 kg was melted in a protective argon atmo- 
sphere and cast toa rectangular ingot (300 mm width, 80 mm thickness, 
240 mm length). After homogenization treatment at 1,150 °C for 2h, 
the ingot was hot-rolled with a starting temperature of 1,050 °C to hot 
strips 3 mm in thickness. Then, the hot-rolled strips were cold-rolled 
to final sheets 1 mm in thickness. The cold-rolled sheets were annealed 
at 870-900 °C for 2-60 min and immediately water-quenched or con- 
tinuously cooled down to 25 °C at the rate of 30°C s_*. The chemical 
composition of the present steel and selected reference materials was mea- 
sured by wet chemical analysis and is given in Extended Data Table 1. 

The reference materials—the press hardening steel and the titanium 
alloy of Ti6Al4V—were obtained from materials suppliers. Flat tensile 
specimens with a gauge dimension of 12.5 mm width by 50 mm length 
by 1 mm thickness were machined in such a way that the tensile axis is 
parallel to the rolling direction. Tensile tests were carried out at an ini- 
tial strain rate of 10 *s | at ambient temperature. The density of our 
steel and of the selected reference materials was measured by pycno- 
metry. The thermal treatment conditions and the properties of our steel 
and of the reference materials are listed in Extended Data Table 2. The 
phases present in the microstructure of our steel were identified by 
X-ray diffraction (Extended Data Fig. 1). Microstructural investigations 
were performed by scanning electron microscopy (SEM) and transmis- 
sion electron microscopy (TEM) coupled with energy dispersive spec- 
troscopy (EDS). 

Figures 1a and b show SEM images of the novel high-specific-strength 
steel (HSSS) of the present study, comparing the microstructures of as- 
cold-rolled and annealed sheets, respectively (see Extended Data Tables 1 
and 2 and Extended Data Fig. 1). The as-cold-rolled microstructure 
consists of austenite matrix and B2 stringer bands parallel to the rolling 
direction (Fig. 1a), but the microstructure dramatically changes after 
15 min of annealing at 900 °C, precipitating out fine B2 particles in be- 
tween the B2-stringer bands in the steel matrix (Fig. 1b). B2 in the 
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Figure 1 | Precipitation of B2 particles during annealing of cold rolled 
Fe-10%AI-15%Mn-0.8%C-5%Ni (weight per cent) high-specific-strength 
steel. a, As-cold-rolled microstructure consisting of austenite matrix (y) and 
B2 stringer bands. RD, rolling direction; ND, normal direction. b, Annealed 
microstructure having fine B2 precipitates in between the retained B2 bands in 
austenite matrix. c, Scanning TEM image of the annealed high-specific- 
strength steel (HSSS) showing morphologies of B2 particles. The inset shows 
the selected area diffraction pattern of a B2 precipitate. d, Partitioning of 
alloying elements between B2 precipitate and austenite matrix. e, Sketches 
illustrating the formation mechanism of B2 precipitates of types 2 and 3 in b. 


annealed sheet (Fig. 1b) has three different morphologies, comprising 
retained stringer bands (type 1), fine particles of size 200-1,000 nm (type 2) 
and finer particles of size 50-300 nm (type 3). A scanning TEM image 
of Fig. 1c clearly shows the size difference of B2 particles of types 2 and 
3. The inset of Fig. 1c shows the selected area diffraction pattern of a B2 
particle. Partitioning of each element between B2 and austenite is illus- 
trated in Fig. 1d, showing EDS composition profiles of Fe, Al, Ni and 
Mn across the interface of B2 and austenite (y). Aland Ni are enriched 
in B2 while the Mn content is higher in austenite (y). Figure le schem- 
atically illustrates the formation mechanism of B2 precipitates of types 
2 and 3. If a cold-worked metal is heated to a sufficiently high temper- 
ature, strain-free new grains are formed by recrystallization, replacing 
deformed ones’. Owing to the inhomogeneous nature of plastic defor- 
mation ata fine scale*”’, the recrystallization kinetics of microscopically 
local areas can be substantially different in deformed steel matrix. When 
the local recrystallization proceeds quickly during annealing, B2 pre- 
cipitates at recrystallized fine austenite grain boundaries (or edges) to 
form the type 2 microstructure. Otherwise, B2 precipitates along the 
shear bands in non-recrystallized coarse austenite grains, resulting in 
the type 3 microstructure. 

Compared in Fig. 2 are the room-temperature tensile properties of 
the ductile HSSS of the present study and selected metallic alloys of high 
specific strength®’”**” (Extended Data Tables 1 and 2). Figure 2a shows 
the representative stress-strain curves of HSSS annealed under various 
conditions in comparison with Fe-Al-Mn-C-based high-aluminium, 
low-density steel’, a commercial titanium alloy (Ti6A14V) and a press 
hardening (fully martensitic) boron steel’*. Flow curves of HSSS show 
large ductility and phenomenally high strain hardening capability even 
at ultrahigh yield strength levels of over 1 GPa. Figure 2b shows the re- 
lation of specific ultimate tensile strength (SUTS) versus elongation to 
fracture. The HSSS shows an exceptional combination of specific strength 
(SUTS) and elongation compared with other high-specific-strength alloys. 
Figure 2c shows the relation of specific yield strength (SYS) versus the 
increase in density-compensated tensile toughness during uniform elon- 
gation (UE). The expression UE(SUTS — SYS) corresponds to the in- 
crease of specific toughness due to strain hardening during uniform 
plastic deformation, that is, for the uniform part of the elongation before 
necking begins after yielding. Owing to the high strain hardening cap- 
ability, HSSS shows extraordinarily high specific tensile toughness even 
at ultrahigh levels of SYS over 150 MPag ‘cm’. 

Scanning TEM images in Fig. 3 show the interaction of dislocations 
with B2 particles in HSSS that had previously been deformed by 0.5%. 
Dislocations are piled up (Fig. 3a) or bowing out (Fig. 3b) at the phase 
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Figure 2 | Room-temperature tensile properties of HSSS compared with 
selected metallic alloys of high specific strength®”””7’. See Extended Data 
Tables 1 and 2. a, Engineering stress-strain curves of the annealed HSSS (I-III) 
(see Extended Data Table 2 for details) compared with x-carbide-strengthened 
TRIPLEX steels®’, a commercial titanium alloy and a fully martensitic press 
hardening steel (PHS). b, Specific ultimate tensile strength (SUTS) versus 
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Elongation to fracture (%) 


(SUTS - SYS) x UE (MPa % g~t cm’) 


elongation to fracture of HSSS compared with press hardening steel and the 
available literature data®’’*-*” (AA2000 is a commercial aluminium alloy). 

c, Specific yield strength (SYS) versus density-compensated tensile toughness 
increase during uniform elongation (UE) of HSSS compared with selected 
metallic alloys of high specific strength®’”**>. 
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Figure 3 | Scanning TEM images of HSSS after tensile deformation (0.5% 
strained) showing interaction of dislocations and B2 particles. The non- 
shearable nature of B2 particles is depicted. a, Dislocations pile up at the 
interface of B2 particle and austenite (y) matrix. b, A dislocation bows out at the 
B2/y interface. 


interface. It is clear that B2 particles are not sheared by gliding disloca- 
tions. The non-shearable nature of the B2 particle accounts for the high 
work hardening rate even at ultrahigh yield strength levels of over 1 GPa, 
leading to ductile ultrahigh-specific-strength steels. 

These findings provide a new alloy-design route to lightweight steels, 
demonstrating that the combination of specific strength and ductility 
accessible to steels is greater than previously thought, and increasing 
the density-compensated tensile damage tolerance of structural metal 
for terrestrial applications. Furthermore, the attractive combination of 
physical and mechanical properties in the low-density steel described 
here is obtainable by simple thermal treatments which are compatible 
with existing commercial processes of the steel industry (Extended Data 
Table 2). The tuning of the distribution and morphology of brittle in- 
termetallic compounds in steel matrix may be useful in many other steel 
applications. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Change of X-ray diffraction pattern during annealing at 900 °C. With the increase in annealing time, the (100) peak of B2 becomes 
more pronounced owing to its precipitation (samples were water-quenched after annealing). au, arbitrary units. 
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Extended Data Table 1 | Composition of the present HSSS and reference materials 


Chemical composition, weight% 


Material 
Fe Cc Si Mn Al Ti Nb V Cr 
HSSS Bal. 0.86 0.02 161 96 0.042 0.004 - - 
PHS Bal. 0.22 0.24 1.2 - 0.04 - - 0.2 
TI6AI4V. =0.12 ~=—0.01 - - 6.1 Bal. - 3.9 - 
TRIPLEX! Bal. 1.8 - 20 11 - - - 5 
TRIPLEX II Bal. 1.0 - 28 12 - - - - 


HSSS, high-specific-strength-steel; PHS, press hardening steel; Bal., balance; ‘-', not applicable. 
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Extended Data Table 2 


Materials 


HSSS 


PHS 


TI6AI4V 


Al alloy 
2000 series 


TRIPLEX 


Thermal treatment 


Temp.xtime; cooling 
900 °Cx2m; 30 °C/s 
900 °Cx8m; WQ 
900 °Cx15 m; WQ 
870 °Cx15 m; WQ 
900 °C x10 m; 35 °C/s 
950 °Cx1 h; FC 
950 °Cx1 h; WQ 
700 °Cx2 h; AC 
1050 °Cx1 h; 0.23 °C/s 
1050 °Cx1 h; 0.81 °C/s 
1050 °Cx1 h; 3.40 °C/s 
1050 °Cx1 h; 5.10 °C/s 
730 °Cx1.5 h; AC 
730 °Cx1.5h; AC 


1100 °Cx15m; WQ 
1100 °Cx15m; AC 
1100 °Cx15m; WQ 
1100 °Cx15m; WQ 
1000 °Cx15 m; WQ 
1000 °Cx15 m; AC 
900 °Cx15 m; WQ 
900 °Cx15 m; AC 
1050 °Cx25 m; WQ 


Density compensated mechanical properties 


SYS SUTS 
MPa/gem? 
199 227 
177 218 
148 198 
174 216 
145 196 
182 194 
237 256 
223 237 
185 215 
194 224 
201 230 
202 244 
194 217 
181 208 
198 207 
117 154 
132 179 
134 172 
110 156 
134 163 
151 160 
175 183 
142 cbrarg 
127 160 
163 171 
140 172 
169 179 
111 152 


WQ, water quenching; AC, air cooling; FC, furnace cooling; ‘-', not available. 


UE 


18.0 
22.0 
27.6 
22st 


21.0 
45.7 
26.9 
26.2 
12.9 
45.3 


% 


TE 


20.3 
24.7 
31.8 
25.6 


15.2 


(SUTS- 
SYS) 
xUE 
J/g 
508 
904 
1361 
960 
245 


729 
1531 
223 
836 
133 
1862 
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Thermal treatment condition and properties of the present HSSS and reference materials 
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Silylation of C-H bonds in aromatic heterocycles by 
an Earth-abundant metal catalyst 


Anton A. Toutov'*, Wen-Bo Liu!*, Kerry N. Betz!, Alexey Fedorov't, Brian M. Stoltz! & Robert H. Grubbs! 


Heteroaromatic compounds containing carbon-silicon (C-Si) bonds 
are of great interest in the fields of organic electronics and photonics’, 
drug discovery’, nuclear medicine’® and complex molecule synthesis**, 
because these compounds have very useful physicochemical prop- 
erties. Many of the methods now used to construct heteroaromatic 
C-Si bonds involve stoichiometric reactions between heteroaryl or- 
ganometallic species and silicon electrophiles® or direct, transition- 
metal-catalysed intermolecular carbon-hydrogen (C-H) silylation 
using rhodium or iridium complexes in the presence of excess hy- 
drogen acceptors*”. Both approaches are useful, but their limitations 
include functional group incompatibility, narrow scope of application, 
high cost and low availability of the catalysts, and unproven scalability. 
For this reason, a new and general catalytic approach to heteroaro- 
matic C-Si bond construction that avoids such limitations is highly 
desirable. Here we report an example of cross-dehydrogenative het- 
eroaromatic C-H functionalization catalysed by an Earth-abundant 
alkali metal species. We found that readily available and inexpensive 
potassium tert-butoxide catalyses the direct silylation of aromatic 
heterocycles with hydrosilanes, furnishing heteroarylsilanes in a sin- 
gle step. The silylation proceeds under mild conditions, in the absence 
of hydrogen acceptors, ligands or additives, and is scalable to greater 
than 100 grams under optionally solvent-free conditions. Substrate 
classes that are difficult to activate with precious metal catalysts are 
silylated in good yield and with excellent regioselectivity. The derived 
heteroarylsilane products readily engage in versatile transformations 
enabling new synthetic strategies for heteroaromatic elaboration, and 
are useful in their own right in pharmaceutical and materials science 
applications. 

Heteroarylsilanes are important motifs in medicinal chemistry and 
drug discovery~’, advanced materials and polymer synthesis’'®, and 
various biomedical applications*"’. In addition, they are emerging as 
one of the most versatile heteroaryl metal species for complex molecule 
synthesis owing to the high natural abundance and low toxicity of 
silicon**. At present, the most common approach to heteroaromatic 
C-Si bond construction involves the interception of heteroaryl lithium 
or magnesium reagents with silicon electrophiles (Fig. 1a, route A). How- 
ever, this method is often limited in scope and requires prefunctiona- 
lization of heteroarenes by using pyrophoric organometallic species in 
stoichiometric quantities’. Powerful heteroaromatic functionalization 
strategies, such as Minisci-type radical substitutions'* and Friedel-Crafts 
reactions'*"*, have been of limited use for C-Si bond construction owing 
to the difficulty of generating the corresponding silyl radicals and sily- 
lium ions. An efficient and regioselective sila-Friedel-Crafts reaction 
to access C3-silylated indoles was recently described”, although the catal- 
ysis required a precious metal ruthenium (Ru) species for activation of 
the hydrosilane. 

Thus far, only complexes based on precious metal elements, namely 
rhodium (Rh) and iridium (Ir), have been demonstrated to catalyse 
the intermolecular C-H silylation of heteroarenes with hydrosilanes’® 
(Fig. 1a, route B). Two examples have been reported: an [Ir(OMe)(cod)], 


precatalyst with a 4,4’-dtby ligand, used at 80°C (ref. 9), and a 
[Rh(coe)2(OH)], precatalyst in combination with a MeO-BIPHEP ligand® 
(Me, methyl; cod, 1,5-cyclooctadiene; 4,4’ -dtby, 4,4’ -di-tert-butyl-2,2’ - 
dipyridyl; coe, cis-cyclooctene.) Excess quantities of a sacrificial hydro- 
gen acceptor were necessary for catalyst turnover in these systems. 
Although these are both important silylation methods, they rely on cat- 
alysts derived from rare and expensive precious metals, which can be a 
significant limitation, particularly for large-scale syntheses. Moreover, 
substrates containing Lewis-basic nitrogen functionalities are notably 
absent in both reports, limiting the use of these methods in pharmaceu- 
tical science and other biomedical applications. Thus, the development 
of a general catalytic method for heteroaromatic C-Si bond formation 
remains a considerable challenge in the broader field of C-H functio- 
nalization. Here we report that inexpensive and commercially available 
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b_ KOt-Bu-catalysed C-H silylation 


«)-« > (rei) —i1 


(Aza)indoles, furans, thiophenes, pyrroles, pyrazoles, etc. 


1-20 mol% KOt-Bu 
[Si]-H 
ait 


* Earth-abundant metal (K) catalyst * Chemo- and regioselective 
* No H, acceptors or additives * TON up to 92 
* Mild reaction conditions * >100 g scale 


Figure 1 | Approaches to the silylation of heteroarenes. a, Route A, classical 
synthesis of heteroaryl silanes by reaction of organometallic species with 
silicon electrophiles. The organometallic species is typically prepared by 
deprotonation of heteroarenes or by lithium—halogen exchange of heteroaryl 
halides. X' = Cl or Br; Hal, halogen; LG, leaving group. Route B, recently 
emerging direct, transition-metal-catalysed C-H activation/silylation. 

Excess amounts of hydrogen acceptors are required. b, A departure from the 
transition metal catalysis paradigm: KOt-Bu-catalysed, acceptorless, cross- 
dehydrogenative heteroaromatic C-H silylation with hydrosilanes. TON, 
turnover number. 
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potassium tert-butoxide (KO?-Bu) catalyses the acceptorless, cross- 
dehydrogenative coupling of aromatic heterocycles with hydrosilanes 
to generate heteroarylsilanes under mild conditions (Fig. 1b). The alkali 
metal catalyst is compatible with a range of functional groups including 
pyridines, piperidines and amines, making this novel C-H silylation 
method immediately applicable to medicinal chemistry and alkaloid 
natural product synthesis. 

Ina recent report, we described the reductive cleavage of C-O bonds 
in aryl ethers using stoichiometric quantities of alkali metal alkoxides 
to activate hydrosilanes at elevated temperatures’’. We were surprised 
to observe minor by-products derived from ortho-silylation with diben- 
zofuran as the substrate. Considering prior reports of Lewis-base acti- 
vation of hydrosilanes'*, we questioned whether these unanticipated 
silylation by-products could be pointing to a more general reaction 
manifold. Thus, with 1-methylindole as a model substrate, an extensive 
optimization exercise was conducted (Supplementary Information). We 
observed that the combination of a bulky basic anion (that is, Of-Bu, 
trimethylsilanolate or bis(trimethylsilyl)amide) and, importantly, a po- 
tassium countercation led to the desired C-Si bond formation. KOt-Bu 
proved to be the ideal catalyst, furnishing synthetically useful C2-silylated 
indole 2a in good yield and with >20:1 regioselectivity under mild con- 
ditions. The reaction can be optionally performed under solvent-free 


20 mol% KOt-Bu 
[Si] —H (3 equiv.) 


LETTER 


conditions, which in certain cases leads to improved selectivity. Exper- 
iments and analyses to rule out catalysis by adventitious transition 
metal residues were carefully conducted (Supplementary Information). 

A variety ofindoles with Me, ethyl (Et), benzyl (Bn), phenyl (Ph) and 
the readily cleavable methoxylmethy] and 2-[(trimethylsilyl)ethoxy]methy] 
groups on nitrogen all lead to regioselective C2 silylation in moderate 
to good yields (Fig. 2, 2a-2f). We then explored the influence of sub- 
stituents at various positions of the indole nucleus and found that Me, 
OMe, OBn, CH,OMe and Ph are all compatible, giving the desired 
products 2g-2n in 48%-83% yield. Several hydrosilanes were exam- 
ined and the silylation products (20-2x) were obtained in good yield. 
A diverse range of N-, O- and S-containing heteroaromatics’” (Fig. 3), 
including pyridine-containing scaffolds (4a—4g and 4j-4l), undergo the 
reaction with high regioselectivity. Reactions at decreased catalyst load- 
ings (1-3.5 mol%; 4j, 4m and 4n) and ona large scale (4h and 4n) dem- 
onstrate the robustness and preparative scale utility of the process. The 
reaction scaled to greater than 100 g without loss of catalyst activity 
under procedurally convenient conditions” (Fig. 4a). 

In general, the reaction proved to be selective for electron-neutral and 
electron-rich heterocycles; indoles possessing electron-withdrawing 
groups are unreactive. To further probe the functional group tolerance 
of the method, a comprehensive robustness evaluation was performed”. 


Me Bn Et 
2a 2b 2c 
Neat, 45 °C. THF, 45 °C Neat, 60 °C. 
C2:C3 > 20:1 C2:C3 > 20:1 C2:C3 > 20:1 
78% yield 82% yield 71% yield 
Me 
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Me Me Me 
2g 2h 2i 
Neat, 45 °C. THF, 25 °C Neat, 45 °C. 
C2:C03 > 20:1 C2:C3 > 20:1 C2:C3 > 20:1 
69% yield 68% yield 83% yield 
MeOCH, Ph 
Cl \ SiEt, Cs 
N N N 
\ \ \ 
Me Me Me 
2m 2n 20 
Neat, 45 °C. THF, 45 °C THF, 45 °C 
C2:C3 > 20:1 C2:C3 > 20:1 C2:C3 > 20:1 
48% yield 48% yield 68% yield 
Me 
(Cl -s N SiHEt, (= seme 
N N N 
\ \ \ 
SEM Me Me 
2s 2t 2u 
Neat, 60 °C. THF, 65 °C MeOt-Bu, 45 °C 
C2:C03 > 20:1 65% yield C2:C3 > 20:1 
64% yield 54% yield 


[Si] 
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R2— N [Si] Ss 
ae ae Ree N 
1 A~N 
1 
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C2 silylation C3 silylation 
2 
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\ \ \ 
Ph MOM SEM 
2d 2e 2f 
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C2:C3 > 20:1 C2:C3 = 10:1 C2:C3 = 14:1 
45% yield 55% yield 67% yield 
MeO BnO 
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Me Me Me Me 
2j 2k 2i 
Neat, 45 °C THF, 25 °C THF, 25 °C 
C2:C3 > 20:1 C2:C3 > 20:1 C2:C3 > 20:1 
61% yield 64% yield 68% yield 
(ss Cl-s Cs 
N N N 
\ \ \ 
Bn Ph MOM 
2p 2q 2r 
Neat, 60 °C MeOt-Bu, 55 °C Neat, 60 °C 
C2:C3 > 20:1 C2:C3 > 20:1 C2:C3 = 14:1 
78% yield 55% yield 66% yield 


Cl -seine Csr Csr 
N N N 

\ \ \ 

Bn Bn Me 


2v 2w 2x 
THF, 45 °C THF, 45 °C THF, 35 °C 
C2:C03 > 20:1 C2:C3 > 20:1 C2:C3 > 20:1 
60% yield 58% yield 75% yield 


Figure 2 | Scope of the KOt-Bu-catalysed silylation of indoles. For the 
reactions of 2g and 2i, silylation on the benzylic methyl group was observed 
with tetrahydrofuran (THF) as solvent; solvent-free conditions often led to 
improved regioselectivity and yield. For the reaction of 2k, silylation at C6 was 
observed as a by-product in THF. The reactions of 1,3-dimethyl indole with 


Et3SiH and PhMe,SiH were sluggish, probably owing to steric congestion at C2. 
For the reaction of 20, bisindolyldiethylsilane was isolated as a by-product. 
See Supplementary Information for details. [Si]-H = Et3SiH, Et,SiH2, 
EtMe,SiH, PhMe,SiH or n-Bu;SiH. MOM, methoxylmethyl; SEM, 
2-[(trimethylsilyl)ethoxy]methyl. 
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Z~nN Z~N NAAN 
\ \ \ 
Me Me Me 


4a 4b 4c 
THF, 45 °C THF, 45 °C THF, 45 °C 
C2:C3 = 6:1 C2:C3 > 20:1 C2:C3 > 20:1 
33% yield 31% yield 50% yield 
an l \ s 
iPhM 
N cee \—siet, \—siphme, 
N N 
\ Ss Ss 
Bn 
4g 4h 4i 
75 mmol scale 
Neat, 60 °C THF, 25 °C THF, 25 °C. 
C2:C3 > 20:1 C2:C3 > 20:1 C2:C3 > 20:1 
69% yield 93% yield 87% yield 
17.39 
Eg ee 5 ne 
n-CsHy4 g7 SIEtz n-CsHi1 0% SiEts n-CsHy4 07% ~ SIHEt, 
4m 4n 40 
1 mol% KO?t-Bu 73 mmol scale 
5 mmol scale THF, 25 °C. THF, 25 °C 
THF, 25 °C a: B > 20:1 a: B>20:1 
ot: B > 20:1 95% yield 78% yield 
92% yield, TON = 92 17.4g 
B p 
(\.. ¥ \, Cae 
N SiEts “N SIEt3 - 7 
| I ; 
Bn Me SiEt3 
4s 4t 4u 
THF, 25 °C THF, 25 °C Dioxane, 85 °C 
mono:bis = 10:1 a: B > 20:1 o:p > 20:1 
80% yield 71% yield 37% yield 


Figure 3 | KOt-Bu-catalysed silylation of N-, O- and S-containing 
heteroarenes. Multigram-scale syntheses were presented for 4h and 4n. 
Catalyst loadings can be reduced to 1 mol% with a TON of 92 (4m). For 4j, with 
3.5 mol% KOt-Bu, TON = 23 (82% yield); for 4n, with 1.5 mol% KOt-Bu, 


The results showed that carbonyl groups in general are not tolerated, 
but are compatible if protected as the corresponding acetal. Ar-Br, Ar- 
I, Ar—CN, and Ar—NO, also shut down the reaction. However, Ar-F, 
Ar-Cl, Ar-CF;, epoxide, N-alkyl aziridine, cis- and trans-olefins, acet- 
ylene, pyridine, and tertiary amine and phosphine moieties are all 
compatible with the silylation chemistry. Even free OH and NH groups 
are tolerated to some extent, apparently owing to a fortuitous silylative 
protection of the heteroatom in situ'® (Supplementary Information). 

Preliminary mechanistic investigations suggest the involvement of rad- 
ical species. However, an elementary silyl radical generation-substitution 
mechanism seems to be unlikely owing to poor reactivity with electron- 
deficient heteroarenes'’*”’. Moreover, the rate of silylation is greater in 
sulphur-containing heteroarenes than in oxygen-containing heteroar- 
enes, and is greater in oxygen-containing heteroarenes than in nitrogen- 
containing heteroarenes, as observed in an internal competition study, 
which provides complementary reactivity to electrophilic substitutions’*”* 
and Minisci-type reactions’*”’. These observations point to an under- 
lying mechanism that is distinct from known heteroaromatic C—-H func- 
tionalization reactions (Supplementary Information). 

Heteroarylsilane derivatives are known to undergo a variety of power- 
ful synthetic transformations; a number of representative examples 
are demonstrated here (Fig. 4b). For example, C2 Si-directed Suzuki- 
Miyaura cross-coupling by the method of Zhao and Snieckus*’, or 
Hiyama—Denmark cross-coupling’ via heteroarylsilanol 6”, furnishes 
2-arylated indole 5. An unusual direct C7 functionalization of ben- 
zothiophene to give boronate esters 7 and 8 was achieved by using a 
blocking group strategy from silylated precursor 4h”. 
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SiEt, MeO SiEt, 
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mono:bis = 15:1 BAN ield 
71% yield oy 


TON = 61 (91% yield). Bisfuranyldiethylsilane was isolated as a by-product in 
the reaction of 40. Unsubstituted thiophene and furan favoured 2,5-bis- 
silyation (4q and 4r). See Supplementary Information for details. [Si]- 

H= Et3SiH, Et)SiH, EtMe,SiH, PhMe,SiH or n-BuzSiH. 


Organosilicon has been extensively investigated in the development 
of advanced materials owing to silicon’s unique physical and chemical 
properties’”®. To demonstrate the utility of our method in possible ma- 
terials science applications, we prepared sila-heterocycle 9 in one step 
directly from the commercially available unfunctionalized heteroar- 
ene by an unprecedented double C-H functionalization involving inter- 
molecular silylation followed by intramolecular silylation’*’®”’ (Fig. 4c). 
A high-yielding bis-silylation of thiophene oligomer 10 furnishes the 
starting material for an entirely transition-metal-free catalytic route 
to alternating copolymers”. Finally, the monoselective silylation of the 
3,4-ethylenedioxythiophene monomer provides a potential strategy for 
the modification of polythiophene-derived materials (Fig. 4c, 11). 

Sila-drug analogues have garnered much attention from medicinal 
chemists because they can offer improved stability, solubility and phar- 
macokinetic properties compared with the parent all-carbon compounds’. 
Moreover, the installed organosilicon functionality can serve as a syn- 
thetic handle for subsequent elaboration, facilitating library synthesis 
and enabling structure-activity relationship studies. As a result, orga- 
nosilicon-containing small molecules are of growing interest in phar- 
maceutical science, and the direct silylation of lead compounds would 
thus represent a new and potentially powerful tool in drug discovery’. 
To evaluate our method for such late-stage C-H functionalization ap- 
plications, we subjected the antihistamine thenalidine and the antipla- 
telet drug ticlopidine to our catalytic silylation conditions. The reactions 
proceeded smoothly in the case of both active pharmaceutical ingre- 
dients, yielding the Si-containing target compounds 12 and 13a-c in 
56%-68% yield with excellent chemo- and regioselectivity (Fig. 4d). The 


©2015 Macmillan Publishers Limited. All rights reserved 


a Practical large-scale preparation of heteroarylsilane building blocks 


1) 20 mol% KOt-Bu 
1.5 equiv. Et,SiH, neat, 45 °C 


2) Filtration and distillation 
C2:C3 > 20:1, 76% yield 
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C Applications to functional materials 
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Figure 4 | Synthetic applications of 
the KOt-Bu-catalysed C-H 
silylation. a, Preparation of 142g 
of C2-silylated indole building 

block 2a. b, Application of 
heteroarylsilanes in cross-coupling 
and a formal C-H borylation at C7 of 
benzothiophene. c, Synthesis of 
precursors to advanced materials and 
polymers. d, Late-stage chemo- and 
regioselective modification of active 
pharmaceutical ingredients. e, KOt- 
Bu-catalysed functionalization of 
arenes by oxygen-directed sp”, and 
innate benzylic sp’ C-H silylation. 
See Supplementary Information for 


oso, One, 


Et Et 
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(from 1-phenylpyrrole) 
Inter/intra- 


H silylation 43% yield 
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piperidines, aniline, benzylic C-H bonds and aryl chloride moieties were 
all tolerated without any observed side reactions. Silylation of aza- 
analogue 14 also proceeded well, demonstrating the compatibility of 
our method with pyridine-containing complex molecules of potential 
pharmaceutical importance. 

Finally, during our investigations we had observed minor amounts 
of sp” and sp® C-H silylation by-products at ambient temperature in 
the cases of methoxy- and methyl-substituted indoles, respectively (that 
is, 15 and 16; Fig. 4e). This led us to consider whether simple arenes 
would react analogously. The ortho-silylation of anisole” and the directing- 
group-free C(sp*)-H silylation of toluene”**° were discovered, fur- 
nishing silylated derivatives 17a and 18a, respectively. Four additional 
examples were demonstrated, providing silylarenes (17b and 17c) and 
benzylsilanes (18b and 18c) with excellent selectivity. Of particular note 
is the C(sp*)-H silylation of 2,6-lutidine, providing an example of C-H 


silylation in an electron-deficient system. Interestingly, methoxy tolu- 
ene 19 and benzyl ether 21, both containing potentially reactive sp’ and 
sp’ C-H bonds, are silylated with opposite selectivities to yield 20 and 
22. In the case of 22, the reaction introduces a Si-substituted chiral 
centre. Optimization and further elaboration of these substrate classes 
is currently ongoing. 

We have reported sp” and sp° C-H silylation reactions catalysed by 
KOt-Bu, which is abundant, inexpensive, commercially available and 
bench stable. The transformation has been applied to an array of 
privileged heteroaromatic scaffolds and to a number of carbocylic aro- 
matic moieties. The potential for late-stage functionalization has been 
demonstrated by the direct silylation of active pharmaceutical ingredi- 
ents. The extension of this work to non-aromatic systems is in progress, 
and detailed mechanistic investigations by experimental and compu- 
tational methods are under way. 
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A seismic reflection image for the base of a 


tectonic plate 


T. A. Stern!, S. A. Henrys’, D. Okaya’, J. N. Louie*, M. K. Savage’, S. Lamb’, H. Sato”, R. Sutherland)? & T. Iwasaki? 


Plate tectonics successfully describes the surface of Earth as a mo- 
saic of moving lithospheric plates. But it is not clear what happens 
at the base of the plates, the lithosphere-asthenosphere boundary 
(LAB). The LAB has been well imaged with converted teleseismic 
waves’”, whose 10-40-kilometre wavelength controls the structural 
resolution. Here we use explosion-generated seismic waves (of about 
0.5-kilometre wavelength) to form a high-resolution image for the 
base of an oceanic plate that is subducting beneath North Island, New 
Zealand. Our 80-kilometre-wide image is based on P-wave reflections 
and shows an approximately 15° dipping, abrupt, seismic wave-speed 
transition (less than 1 kilometre thick) at a depth of about 100 kilo- 
metres. The boundary is parallel to the top of the plate and seismic 
attributes indicate a P-wave speed decrease of at least 8 + 3 per cent 
across it. A parallel reflection event approximately 10 kilometres dee- 
per shows that the decrease in P-wave speed is confined to a channel at 
the base of the plate, which we interpret as a sheared zone of ponded 
partial melts or volatiles. This is independent, high-resolution evid- 
ence for a low-viscosity channel at the LAB that decouples plates 
from mantle flow beneath, and allows plate tectonics to work. 
The original concept of plate tectonics envisaged lithospheric plates 
that were mechanically and kinematically distinct from the underlying 
convecting mantle’. Subsequently, the base of the lithosphere has been 
defined as a critical isotherm’, as a transition zone from conductive to 
convective heat flow, or as the base of an elastic layer capable of sup- 
porting flexural stresses on a geological timescale*. As these definitions 
are largely underpinned by thermal criteria, it is perhaps surprising that 
a marked change in geophysical properties at a depth of around 100 km 
is sufficiently consistent to define the LAB’. For example, a seafloor 
magnetotelluric (MT) study of young oceanic lithosphere identifies a 
thin (=25 km thick) high-conductivity layer at the base of the litho- 
sphere (at 45-70 km depth), which is interpreted as a melt-rich, low- 
viscosity channel®. Recent receiver-function studies have indicated that 
in places the LAB, at depths of 60-110 km, occurs where the S-wave 
speed (vg) drops by ~8% over a depth range of less than 11 km (refs 1, 2). 
But the ability to resolve the sharpness of the LAB, which is critical for 
determining its nature’, depends on the wavelength of the seismic waves 
used'”, Active-source seismic methods allow a marked improvement 
in resolution, as large (>200 kg) dynamite shots can create reflected P 
waves with wavelengths of <1 km in the mantle*”, compared to wave- 
lengths of 10-40 km for the teleseismic waves used to generate receiver 
functions’. Active-source and receiver-function methods do, however, 
provide a natural complement to each other because, whereas the for- 
mer give vastly improved resolution, receiver functions provide crucial 
information on phase and therefore allow us to assess if there is a de- 
crease or increase in seismic velocity across a converting interface. 
Our data were acquired during the SAHKE (Seismic Array on the 
HiKurangi margin Experiment"®) experiment (Fig. 1), which was located 
where the ~120-Myr-old’® (120 Ma) Pacific plate and Hikurangi Pla- 
teau subduct westward beneath continental New Zealand (Fig. 1). We 
deployed 878 vertical and 300 three-component seismographs along a 


85-km-long line’, with a geophone spacing of 50-100 m. This design, 
together with 12 (500-kg) explosions in steel-cased boreholes, yielded 
high signal-to-noise ratios and deep P-wave reflections in a wide fre- 
quency band of 8-20 Hz (Fig. 2 and Extended Data Fig. 1). 

Structure in the initial 12 s of two-way travel-time (TWTT) is seen 
on all shot records and shows the top of the subducted Pacific plate at 
depths of 15-30 km, which is concordant with Benioff Zone seismicity’” 
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Figure 1 | Location map for SAHKE lines 01-04 and shot points for 
SAHKE04. Main panel, offshore lines SAHKE 01-03 are multichannel lines 
recorded both on land and offshore for analysis of crustal structure’®. Red 
stars show position of 12 dynamite shots, of 500 kg each, on the 85-km-long 
onshore SAHKE04 line. Vertical component seismographs (887) were 
installed. The direction of oblique subduction of the Pacific plate beneath the 
Australian plate is shown by the black arrow, but the dip direction, or plunge, 
of the subducted plate is parallel to the SAHKE04 line. Top inset, plate 
tectonic setting of New Zealand. Bottom inset, enlarged section showing 
seismograph lines and shot positions of the SAHKE04 line. Mapped faults 
crossed by the seismic line are labelled. 
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Figure 2 | Shot record for shot 11 and ray tracing model for arrivals from 
shots 11 and 4. Main panel, data shown have been band-pass filtered (12- 
20 Hz) and an attribute filter’® applied. The left scale is in two way travel time 
(s). Between 7 and 10s (~20-30 km) are three distinct reflections linked to 
under-plated sediment on top of the subducted plate’®. RO is an interpreted 
reflection from the top of the subducted plate, at the interface between 
under-plated sediment and oceanic crust**. Deeper events are the 20 s (at zero 
offset), southeast-dipping, R1 reflector and the 25-30s doublet R2 and R3 
reflectors that are interpreted as the LAB zone (dipping to northwest). The red 
dashed rectangular box outlines where a doublet (discussed in text) can be 
seen with a thickness of ~2 s two-way travel-time (TWTT). The steeply 
dipping event at ~8-10 km offset is the airwave. Inset: a ray-tracing solution 


(Fig. 2). A strong reflector (RO on Fig. 2) at about 9s TWTT has been 
previously identified as the interface between a channel of low-vp sub- 
ducted sediments and the top of the Pacific plate. The Moho for the 
Pacific plate is seen at ~13 s TWTT, at zero offset, as a blurred zone of 
reflectivity that becomes prominent as a wide-angle reflection at an 
offset of ~55 km (Extended Data Figs 2 and 3). 

Two deeper reflections observed between 20 and 30s TWTT are seen 
on shot 11 (Fig. 2), and are referred to as R1 and R2. They dip steeply to 
the southeast, and gently to the northwest, respectively. Migration solu- 
tions (Fig. 2 inset, Extended Data Fig. 2b, c) indicate that the likely 
origin of the R1 reflection(s) is west of the SAHKE04 profile and within 
the mantle of the adjacent Australian plate. These R1 reflections are in- 
terpreted as originating from a deep thrust zone linked to a known pe- 
riod of Miocene tectonic shortening"’, and their geological interpretation 
is not discussed here. The key focus of this study is the R2 and associated 
R3 (2-3s TWTT below R2) reflectors, which are at a depth of ~100 km 
(based on velocity models for the crust and mantle of southern North 
Island’®"). All 12 shots and 878 receivers are used to produce an image 
(Fig. 3) where up to 15 shot-to-receiver traces, which reflect from a 
common subsurface reflection point, are stacked together to form a sin- 
gle trace (Extended Data Figs 4 and 5). 

The most plausible interpretation of the R2 and R3 reflectors is that 
they are sharp contrasts in physical properties associated with the LAB 
of the Pacific plate. This way, R2 and R3 parallel the observed dip on 
the top of the plate’, and define a 73 + 1 km thick lithosphere (Fig. 2, 
Extended Data Fig. 4c), consistent with a sub-solidus model for the oce- 
anic LAB’ and slightly thinner than ~ 100-Ma oceanic plates elsewhere 
in the western Pacific’. We eliminate the possibility that R2 and R3 are 
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based on reduced (at 6 km s_') travel-time picks of the R2 reflector from shots 
11 and 4 (secondary inset), and the calculated and observed travel time fit. 
The depth to the R2 reflector is 93 and 102 km beneath shots 4 and 11, 
respectively, based on a velocity model built from previous studies'®"” 
(Extended Data Fig. 4b), and it dips 12°-15° to the northwest. The incidence 
angle range for the R2 reflections is ~0°-15°. The position of the southeast- 
dipping R1 reflector is schematically shown based on ray tracing and migration 
(Extended Data Fig. 2). Note that R1 is located along-strike but 50 km 
northwest off the end of the profile and beneath the Taranaki Fault Zone 
(Fig. 1). Numbers represent P-wave speeds in kms~'. LAB is the interpreted 
lithosphere-asthenosphere boundary. 


artefacts related to earthquakes, because they are seen in energy returns 
from more than one shot. In addition, forward modelling shows that 
they cannot be explained as multiples of the shallower RO and R1 crustal 
reflectors (Methods). Finally, we investigated the possibility that they 
are ‘sideswipes’ or reflections out of the plane of our transect (Methods); 
crucially the lack of coherent energy in an ‘out-of-plane’ shot gather, 
compared to the in-plane gather (Extended Data Figs 6-8), as well as 
the complete absence of evidence for features with the required dimen- 
sions and orientation in the crust or subducted slab either side of our 
transect, make it difficult to sustain a side-swipe interpretation. We 
note that between 35 and 40s, there is seismic energy (labelled R4; see 
Fig. 3a) that shows some coherency and parallelism to the overlying R2 
and R3 reflectors. Although the timing for this event is consistent with 
it being a S-P conversion from R2 (Extended Data Fig. 9), it is not well 
enough defined to make a confident identification. 

The R2 reflections are in the frequency range of 8-20 Hz (Extended 
Data Fig. 1), consistent with frequencies of other reported deep mantle 
reflections from large dynamite shots*”—such high frequencies can be 
explained by a travel path mainly in the high-Q mantle of the subducted 
Pacific plate’’. For example, a spectral ratio analysis (Extended Data 
Fig. le) for the frequency spectra of the RO and R2 reflections provides 
an estimate for Qp of ~ 1,000 for the oceanic plate, similar to that found 
in other seismic studies'*. Additional insight into the nature of the R2 
reflector comes from its seismic attributes. As a rule of thumb, to ob- 
serve a reflection, the normalized thickness of the acoustic-impedance 
transition zone, d//, will be less than 0.5, where dis the transition thick- 
ness and / is the seismic wavelength’’*'> (Extended Data Fig. 7f). For 
R2, the reflected wave has a central frequency of ~14 Hz anda typical 
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upper-mantle vp of ~8kms_', giving a thickness d of ~280 m. This 
implies that seismic velocity changes across R2 must occur over a thick- 
ness <1 km for a credible reflection to be observed. 

Constraints on vp below the R2 reflector are obtained by compar- 
ison of the instantaneous amplitude’* of reflections with those for the 
shallower RO reflector (Fig. 2), indicating a reflection coefficient’ for 
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Figure 4 | Calculated reflection coefficient (C,) for R2 for P-P reflections 
plotted against incidence angle, and interpretation carton of the LAB. 

a, Plot of reflection coefficient versus incidence angle estimated for R2 and 
theoretical curves based on Zoeprittz equations’’. vp and density for the base of 
the lithosphere (R2) are fixed at 8.5 km s and 3,400 kg m°, respectively, 
and vp is allowed to vary below R2. The different curves show C, values for 
labelled percentage drop in vp and for two density contrasts of —50 and 
—100kgm° across the LAB. We plot the magnitude of calculated reflection 
coefficient against a target region of (C,)p2 = 0.05 + 0.014 for our estimate of 
the minimum value of C, acquired from calibration with RO (Methods). 

A velocity drop of 8 + 2% is implied. If we make a correction for Q (Methods) 
for the different ray paths of RO and R2, then C, = 0.065 + 0.027 and the 
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Figure 3 | A low-fold stack for the 12 shots 
recorded on 878 seismographs, with moderate 
and strong median filters. a, Low (1-15)-fold 
stack (fold = number of seismic traces that reflect 
from a common subsurface point that are 

stacked into a single trace) that has been processed 
with a median filter (Methods) that smooths traces 
in both space and time, and acts like a low-pass 
filter”. Plotted at approximate true scale with travel 
time on the left vertical axis, and approximate 
depth on the right vertical axis. The velocity model 
used and detailed fold geometry are given in 
Extended Data Fig. 4. The rectangular moving 
window smoothing kernel was over 21 traces 

(2.1 km) by 101 time samples (1.01 s). The R1 to R4 
reflectors (see text) are highlighted. Colours are 
scaled to the amplitude of reflection. PmP is the 
interpreted position of the Moho for the Pacific 
plate. b, As for a, but with a stronger median 
filter: the rectangular moving window smoothing 
kernel was over 4.1 km laterally and by 2.01 s in 
travel time. Interpretations of the reflections 
labelled in a as R1 to R4 are shown as discussed in 
text. The southeast-dipping R1 mantle reflectors 
are interpreted to originate from a reflector off 
the northwest end of the SAHKE04 profile (see 
Extended Data Fig. 2). Earthquakes from the New 
Zealand GeoNet catalogue are plotted as red 
crosses, and show the top of the subducted Pacific 
plate between depths of ~15 km and 30 km along 
the SAHKE04 profile. 
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R2, C,, of 0.065 + 0.027, after corrections are made for intrinsic attenu- 
ation'® within the mantle lithosphere and for geometrical spreading of 
the wave front (Methods). Deep-mantle reflections analysed in a sim- 
ilar manner from elsewhere in the world require reflection coefficients 
in the range 0.04 to 0.14 (refs 9, 18); our predicted value for C, is at the 
lower end of this range. Using a value of 8.5 kms for vp at the base of 
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estimate for a drop in vp is 10 + 4%. The theoretical values of C, are plotted 
against incidence angle out to 15° (with respect to a 15° dipping reflector), 
which is the maximum angle for stacked reflections based on our ray-tracing 
model. b, Summary schematic for our interpretation of physical properties at 
the base of the Pacific plate lithosphere. We interpret a channel about 10km 
thick in which the vp drops at least 8 + 2% due to the pooling of melt”®”! or 
water”” at the LAB implying changes in: vp from this study; percentage melt”””’ 
as inferred from Avp, or percentage water content’; viscosity is schematic 

but based on our required melt level and model results”; and strain rate based 
on this study and assuming the background strain rate in the oceanic 
lithosphere is low. 
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the lithosphere’’, we find a vp drop of 10 + 4% below R2 is required to 
give the observed value of C, (Fig. 4a), assuming an accompanying drop 
in density across R2 of between 50 and 100kgm~* (Methods). A pos- 
itive impedance contrast here could also produce the R2 reflection, but 
would require vp > 9.2kms_ ' below R2, whichis not plausible at depths 
of 100 km (ref. 19). A drop in vp of 8-10% also fits with receiver-function 
studies that show negative polarity and a similar percentage drop in vs 
across the LAB’. 

A clear separation of ~2s is observed between R2 and the under- 
lying sub-parallel R3 reflectors for shot 11 (Fig. 2); on the stacked sec- 
tion (Fig. 3a) the separation between R2 and R3 is closer to 3s due to 
smearing inherent in the stacking method (Methods). In addition, the 
amplitudes of R2 and R3 vary in a non-systematic way (Figs 2 and 3a), 
so that R3 cannot be a multiple of R2. We propose that R2 and R3 are 
the top and bottom ofa low-seismic-wave-speed channel that is 8-12 km 
thick (that is, 2-3s TWTT thick at a vp of ~7.8kms_'). 

Mechanisms for a marked drop in seismic wave speeds at the LAB 
include the onset of layered zones of partial melt”*”, volatiles’”, and 
consequent anisotropy’. Partial melts of about 2% in a LAB channel are 
consistent with our estimated 8-10% drop in vp (ref. 23), and is also in 
line with more general petrological arguments based on the study of xe- 
noliths from oceanic volcanoes”’. Whatever the mechanism, a special 
explanation is required to explain approximately equal, but opposite, 
acoustic impedance contrast at the boundaries of the channel. We pro- 
pose that melt or volatiles, ponding at the base of the lithosphere, have 
created a low-viscosity”” and high strain/strain-rate channel with dis- 
tinct attributes (Fig. 4b), which is undergoing shear due to relative mo- 
tion between the Pacific plate and underlying asthenospheric mantle 
(~9cmyr ', in the hotspot reference frame”). If all this motion is 
accommodated in a zone ~10 km thick, it would produce strain rates 
of ~3 X 10 '*s 1, and with finite shear strains of ~10 in ~1 Myr. We 
suggest that under these conditions there would be intense strain and 
melt localization”, giving rise to the observed sharp acoustic impedance 
contrasts at the top and bottom of the layer. 

A sheared, low-viscosity channel, up to 25 km thick, has been pro- 
posed at the LAB of a 20-Ma oceanic plate, seaward of the trench, based 
on a MT study®. Our study provides new and well-resolved evidence 
of a thinner channel extending down-dip of the trench beneath the 
120-Ma subducted plate. These channels may therefore be a ubiquitous 
feature of oceanic plates, both inboard and outboard of the trench. An 
important question is then whether such channels also exist beneath 
continental lithosphere. In this regard, it may be significant that a dou- 
ble seismic reflection, similar to what we have observed, was found by 
a deep reflection survey at 22 s TWTT beneath the continental shelf of 
Norway, and has also been tentatively interpreted as the LAB”*. This 
raises the intriguing possibility that thin channels of localized partial 
melt, or volatiles, are widespread features at the base of tectonic plates 
in general. If so, they would explain why tectonic plates apparently slide 
with minimal basal drag’’”. More high-resolution studies are required, 
however, and our experiment demonstrates it is possible to do this with 
active-source seismic methods. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Seismic processing. For the image in Fig. 3a we conducted a regular processing 
sequence with the following sequence using Global Claritas. (1) ADDGEOM: ge- 
ometry and CDPs written to trace headers. (2) TREDIT: edit bad traces. (3) FDFILT: 
one filter per trace: bandpass (8-14-20-25 Hz). (4) AIRWAVE: air wave mute. 
(5) GENSORT: traces sorted into CDP bins with nominal spacing of 100 m. (6) NMO: 
normal moveout applied using input stacking velocity file (Extended Data Fig. 8). 
Stretch mute at 500% with taper of 1s. (7) ATTRIBUTE: conversion to instant- 
aneous amplitude. (8) STACK: stack with unity normalization; trace offset range = 
—50 to 50 km. (9) AGC: normal AGC. Window = 9,990 ms. (10) WRITESEGY: disk 
file written. 

We applied to the final stack (Fig. 3a) an amplitude-enhancement smoothing 
method that is equivalent to a form of low-pass filter in both space and time. This is 
called a median filter”, which is a nonlinear digital filtering technique used to re- 
move noise. A median filter replaces each data point in a data set with the median 
of the data points in a specified region centred on that data point. The smoothing 
was applied to the stack after converting the seismogram traces, having both pos- 
itive and negative amplitudes, to the all-positive instantaneous amplitude or enve- 
lope attribute’®. The dimensions for the image in Fig. 3a of the rectangular moving 
window smoothing kernel were 21 traces (2.1 km) by 101 time samples (1.01 s). 
The smoothing centred the moving window over each sample in the stack, and 
placed at that location the square of the median of all the amplitude envelope sam- 
ples within the window, equally weighted (a squared ‘median filter’ as used is in 
image processing as described at http://en.wikipedia.org/wiki/Median_filter). For 
the section in Extended Data Fig. 5c, the instantaneous amplitudes are multiplied 
at each sample by the squared median-filtered amplitudes of Fig. 3a. Thus, the stack 
is enhanced at each point by the squared median of the neighbouring amplitudes, 
making the distribution of amplitudes less Gaussian. 

Calculating C, for the R2 (LAB) reflector. If the absolute reflection coefficients 
(C,) from a deep mantle reflection can be deduced from shot gathers, one can gain 
estimates for the change in seismic wave-speeds across the reflector. 

The reflection coefficient at normal incidence is given by: 


C, = (Z, — Z))/(Z, + Zp) (1) 


where acoustic impedance Z = pvp, Vp is P-wave speed, p is density and Z, and Z, 
are the associated values at top and bottom of a reflector. At non-zero incidence 
the S-wave velocities become important, and the Zoeprittz equations’’ give the 
general expression for C, at any angle of incidence between 0° and 90°. Density is 
also a parameter in equation (1), but as it usually only varies by only a few per cent 
its variation has a minor effect on C, compared to the changes in seismic wave speeds. 
C, can vary from —1 to 1. A negative value of C, implies a phase change at the 
boundary. In our study the data are not clear enough to determine phase so we only 
consider the magnitude of C,. 

Calculating C, of deep reflections often involves the calibration of the amplitude 
for the reflection of interest with a shallower one that is from an interface with a 
known acoustic impedance contrast'*. For example, in marine experiments the am- 
plitudes of deep mantle reflections recorded can be calibrated against the water 
bottom reflection’®. As we are dealing with land-based data, we have to choose an 
upper crustal reflection as a calibration standard. In order to make this calibration 
between two reflections at different depths, two corrections must be made: for 
geometrical spreading of the wavefront, and for intrinsic attenuation between the 
two reflections (Q” ')!*!”. These are discussed further below. Because of the uncer- 
tainties surrounding Q, and its nonlinear relationship to seismic amplitude, we 
can at least place a minimum bound on C, if we ignore the effect of Q’*. 

On shot 11 there is a sequence of three reflectors of varied quality between 7 and 
10s TWTT (Fig. 2). The third reflector (RO on Fig. 2) is the strongest, and has been 
interpreted from the SAHKE active source experiment as the interface between under- 
plated sediments and the underlying oceanic crust’®. Receiver function studies” 
from the northwest end of the SAHKE line also clearly identify a low velocity layer 
on top of the subducted plate beneath southern North Island (Extended Data Fig. 9). 
Four studies identify vp in this channel to be in the range 4.4-5.1 kms‘ with an 
average of 4.87 kms 1 and standard deviation of 0.32 kms°! (refs 10,28). We adopt 
vp = 4.87kms | and density = 2,500 kg m ° for thelow velocity channel, and vp = 
6.5kms_! (ref. 10) and density = 2,900kgm* for the oceanic crust below RO, 
giving a reflection coefficient (based on equation (1)) of (C,)go = 0.22. We assume 
the density uncertainty is small compared to that of the P-wave speeds. We allowed 
vp of the oceanic crust below RO to vary between 6.4and 6.7 kms_', the vp on top of 
RO to vary between 4.4 and 5.1kms_', and then ran a Monte Carlo simulation of 
1,000 random samples to obtain a predicted value of the reflection coefficient of RO 
(+2s.d.): 


(C,)ro = 0.22 + 0.04 


LETTER 


We solve for (C,)g2 where (C,)po/(C,)r2 = (RO/R2)amp, and where (RO/R2)amp is the 
observed ratio for the RO and R2 reflections. 

To obtain an estimate of this ratio we compared and measured ratios of instant- 
aneous amplitude, on common traces, for the near vertical incidence angle reflec- 
tions RO and R2, on shot 11. The only processing is a spherical divergence 
adjustment that compensates for the natural attenuation of signal due to geomet- 
rical spreading of the wavefront based on the adopted velocity model and travel 
time’. We made 13 independent measurements of (RO/R2)amp to give a mean value 
of 4.4 with standard deviation of 1.45. The standard error of the mean is therefore 
1.45/(13)°° = 0.4. 

The predicted reflection coefficient of R2 is: 


(C,)r2 = (Cy)Ro/(RO/R2)amp = 0.22/4.4 = 0.05 (2) 


Uncertainties. We estimate the uncertainty from the propagation of errors for a 
quotient quantity”. That is, if X = A/B, the variance of X is given by 


Sy? = (A’S_” + B’S4”)/B* 


where Sz and S, are standard deviations of the mean. So for A = 0.22 and B = 4.4, 
we find S, = 0.02, Sg = 0.4, A/B = 0.05, and from the above, Sx = 0.0064. At the 
95% confidence level (20): critical t (for 95% and 13 — 1 = 12 degrees of freedom) 
is 2.18 (using Excel tinv(0.05,12) function). Therefore (C,)p2 = 0.05 + 2.18 X 
0.0064 (95%) = 0.05 + 0.014. 

We also applied a Monte Carlo method of error evaluation to estimate (C,)p9. 
We ran 1,000 random samples using the means and standard deviations of the two 
quantities that define (C,)p2 (that is, (C,)po and (RO/R2)amp) and we get the 
following: 


(C,)po = 0.05 + 0.007 (1s.d.) or 


(C,)po = 0.05 + 0.014 (2s.d.) (3) 


So identical results are obtained from both methods of error evaluation. 

Finally, we consider a correction for intrinsic attenuation. The change in ampli- 
tude (AA) due to a seismic wave (frequency f) traversing a medium for a time t, 
with attenuation (Q”'), is given by’: 


AA = exp(—1ft/Q) 


Now Qin the subducted plate beneath the North Island is high (~1,000), whereas 
Qin the crust above is much lower (~350) yet variable’*. The central frequency of 
the RO and R2 reflectors is about 19 and 14 Hz, respectively, and the two-way travel 
times are 9 and 27 s respectively. 

The Q correction factor for Rap is then: 


AA = [exp(—tfrotro/Qro)/exp(— Tfr2tro/Qro-avge) | (4) 


where Qro-avge is the path averaged attenuation (over n layers) down to R2 and 
back given by’?: 
n 
Qro—avge = So ta DO (tn/ Qu) (5) 
i=1 

where t; and Q;are respectively travel time and attenuation in the ith layer. We 
adopt a two layer model with ¢, and f, equal to 9 and 18 s, respectively, and with Q; 
and Q, equal to 350 and 1,000, respectively, and the path averaged Q to R2 is (from 
equation (5)) 617. 

The correction factor AA (equation (4)) is sensitive to both adopted frequencies 
and Q values. For example, if we fix the central frequencies as above and adopt a 
range of Q = 350 + 100 for the crust based on published cross-sections’, then the 
correction is 1.3 + 0.2, which we use to scale the result of equation (3) to get: 


(C,)p2 = 0.065 + 0.027 (6) 


Here we have calculated the uncertainty using the propagation of errors for a 
product”. This is the preferred value of (C,)g> although the uncertainty has dou- 
bled from that in equation (3). Therefore, following ref. 18, an alternative, and more 
robust, conclusion is that if we ignore the correction for Q then the lower bound on 
(C,)po is = 0.05 + 0.014. 

Tests for R2 being earthquakes, multiples or sideswipe. Earthquakes. R2 and R3 
are seen on multiple shots (Extended Data Fig. 3) and form a coherent stacked sec- 
tion, even when the best shot (shot 11) is not included in the stack (Extended Data 
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Fig. 5a). We therefore reject the possibility that R2 and R3 are earthquakes that 
have fortuitously appeared across our records. 

Multiples. Wave equation modelling (Extended Data Fig. 6) shows that multiples 
of reflections from the top of the plate reflection (RO at 9 s) will not appear at 27 s 
with the correct moveout to replicate R2. The reason why such multiples, if they 
are present, will not be mistaken for the primary R2 and R3 reflections is that 
because the layers are dipping the first multiple will appear to be froma layer that is 
twice as deep as the reflector, and with a dip that is twice as large*”. The next mul- 
tiple after that will double in dip, and so on. For a dipping reflector the apex of the 
reflection hyperbola will move laterally by’’: 


X = 2dsind (7) 


where ¢ and d are respectively the dip of, and perpendicular distance to, the re- 
flector. The primary reflector for R* on Extended Data Fig. 6 has moved ~12km 
updip from the shot point (that is, d = 22 km, @ = 15° in equation (7) above) and 
the multiple (M*1) has moved ~50 km updip (d = 44 km, @ = 30°). So by the time 
the second multiple crosses the position of the R2 and R3 reflections, the multiples 
are dipping at a very high angle and could not be mistaken for R2 or R3 (Extended 
Data Fig. 6). Note that the synthetic seismic section generates much stronger mul- 
tiples than are seen in reality due to effects of topography and near surface effects. 
Sideswipe. Reflections from out of plane surfaces can appear on shot records look- 
ing like they are from reflectors directly below. We list five tests to check for side- 
swipe phenomena: (1) Coherency. We made shot gathers with a median filter for 
shot 11 that are both in line with the seismic line and at right angles to it (Extended 
Data Fig. 8). If R2 is sideswipe appearing with some coherence on the inline shot 
record, it should be even more coherent in the short cross line record. It is instead 
less coherent, suggesting R2 is more likely to be a reflection from a surface directly 
below the seismic line rather than one to a side. (2) Prestack depth migration. We 
performed pre-stack depth migration’® both inline with the seismic profile and at 
right angles to it (cross line) (Extended Data Fig. 7). On the inline migration we see 
coherent energy captured in both the 30 and 90 km depth ranges, whereas there is 
no visible energy at all in the cross-line migration (Extended Data Fig. 7). This is con- 
sistent with the reflections coming from directly below the line, and not sideswipe 
from a structure to the side of the profile. (3) Frequency content of R1 and R2. We 


showed from a spectral ratio analysis that the higher frequency of R2 (8-20 Hz) is 
consistent with a raypath from RO to R2 through high (~1,000) Q upper mantle 
(Extended Data Fig. 1d). In contrast the R1 reflection, which has a similar travel 
time to R2, has a lower frequency content (6-10 Hz). When R1 is migrated (Extended 
Data Fig. 2) we can see its travel path is almost entirely through the crust, which in 
this area has much lower value of Q of about 300-400 (ref. 13). If R2 were indeed a 
sideswipe, its path would also be mainly through heavily fractured crust and we 
would then expect a lower frequency signal like R1. (4) Three-dimensionality in the 
survey geometry and effect on the stack. If we draw a straight line between shots 1 
and 12 then there is deviation of the shot locations from that line of up to 8 km. 
These deviations would manifest themselves as large static corrections on a stack 
of sideswipe reflections, which would degrade the quality of the stack. (5) Strike of 
geology and structure. The dominant strike of the geological structure and strike 
slip faults in the vicinity of the SAHKE line is northeast-southwest, which is at right 
angles to the seismic line (Fig. 1). The strike-slip faults that have been active for the 
past 20 Myr are spaced at 10-20 km across the lower North Island; therefore, if 
there were structures in the lower crust, or upper mantle, at right angles to the faults 
they would have been differentially offset by motion on these faults and thus not 
likely to produce coherent returns over lateral distances of 10-20 km. Finally, no 
obvious offset or tear of the subducted plate is evident beneath the southern North 
Island”. 
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Extended Data Figure 1 | Frequency spectra and analysis. a, Frequency analysis’® between R2 and RO reflections that yields a least squares linear fit to 
spectrum of the R2 reflection and the background noise. We only show the data to give a gradient of —0.0627. Theoretically this gradient value can be 
frequencies above the 4.5-Hz cut-off frequency of the geophones used. equated to tAt/Qp (ref. 16), where At is the TWTT between the RO and R2 
b, Frequency spectrum of the RO and R2 reflections. c, Table summarizing the __ reflectors. For At = 20s, we get an estimate of Qp (inverse attenuation) of 1,002 
frequency range for all events on the different shots; summaries of the with a standard error of +30 (based on least squares linear regression). 


geological conditions in each shot hole™ are also given. d, A spectral ratio 
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Extended Data Figure 2 | Pick migration for R1 reflector and shot gathers. 
a, Plan view of onshore and offshore SAHKE lines with shots 11 and 12 labelled. 
b, Pick migrations for R1 reflection for shot 11 based on a laterally varying 
velocity model, which is derived from earthquake and shot data’®. We use 
the velocity model created by 3D tomography to carry out a migration of 
reflection picks'®. Where the arcs converge to a constant solution gives the best 
structural interpretation of the reflector. We show the results for two shots 
(b, shot 11; c, shot 12) and the common solution. The solution (black bar) 
appears to be a reflector in the mantle of the Australian plate, which dips to the 
southeast and is located within the Taranaki Fault Zone. d, Shot gather for shot 


200 km 
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11 with low band-pass filter of 5-10 Hz that brings out the low frequency, RI, 
southeast-dipping reflector (20s depth at zero offset). It also shows the RO 
reflector (~9 s at zero offset) for the top of the plate that dips to the northwest. 
Vertical axis is travel time (s) and horizontal axis is shot offset in km. S, is the 
crustal refracted S wave. e, Shot gather for shot 11 with band pass filter 
16-30 Hz that shows up the R2 reflector with its broad frequency content, 
and suppresses the low frequency R1 reflection. R,.g and P,,,P represent 
reflections interpreted to come from the upper surface of the sediment 
channel on top of the oceanic crust, and from the Moho of the oceanic 
Pacific plate, respectively. 
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Extended Data Figure 3 | Two shot gathers from each line end showing 
basement rock for shots 9, 10 and 11, and the very bottom of shot hole 12. 


reflections RO to R3. a, Shot 12; b, shot 3. Both plots show data that have been 
band-passed between 8 and 25 Hz. The quality of the shots is highly dependent 
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Extended Data Figure 4 | Stacking and model sensitivities. a, Stacking chart 
for image shown in Fig. 3a. Colour scale shows magnitude of the fold, orange 
stars are shot points. Numbers on the grid are metres north and east from 
the New Zealand Map Grid. b, Plots of seismic velocity versus depth’? and of 
stacking velocity (the root-mean-squared average seismic velocity between 
the surface and a specific depth) versus depth used in the processing of the 
seismic section (Fig. 3). c, Plot of predicted plate thickness versus average vp for 
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the oceanic mantle; four plots are shown, each one derived using a different 
value of average crust vp (labelled). Based on the position in travel time for 
the reflectors at zero offset on shot 11 (Extended Data Fig. 2c), we take the travel 
time in the oceanic crust to be 4 and that in the oceanic mantle lid to be 14s. 
The green dashed box represents the preferred range of solutions based on 
likely velocities for the oceanic crust and oceanic upper mantle’’. Note that the 
uncertainty in the thickness of plate determination is << + 1 km. 
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Extended Data Figure 5 | Stacking tests. a, Stack as for Fig. 3a but without —_ stack, suggesting that the other shots are making a significant contribution. 
shot 11. This is a check to ascertain how much the high-quality, higher- Dashed line shows our interpreted position for the Moho of the oceanic Pacific 
frequency shot 11 dominated the stack. We still see the main features in this _ plate (P,P reflection). b, A stack with no median filter applied. 
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Extended Data Figure 6 | Wave-equation modelling for a dipping plate 
model. a, A model that simulates a highly simplified (oceanic crust not 
included) SAHKE structure for input into wave equation modelling. A layer of 
low-wave-speed sediments is included at the base of the crust, so we can 
examine how multiples from this channel interfere with the proposed R2 

and R3 reflections at the base of the plate. The simulation is based on e3d*° and 
is run with vg = 0, so no S-wave reflections are created. b, Synthetic shot 
gather for shot 11 geometry based on the model. R* and RO are respectively 
reflections from the top and the bottom of the sedimentary layer; R2 and R3 are 
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reflections from the LAB channel. The M*1 and M01 events are interpreted as 
multiples of R* and RO, respectively, based on the lateral movement of the 
apex for the reflection hyperbola (see Methods). The second multiple appears to 
cross the R2 and R3 reflections, but is doing so at such a high angle it would 
be detected if present. Inset, the shot gather for shot 11 where no multiples 
can be seen at all. This is interpreted to mean multiples are not efficiently 
created where surface topography is rough, as it is for much of the SAHKE line. 
xdiff is diffraction from the pinch-out of the crustal layer marked as x in a. 
M*2 is interpreted as a second order multiple of R*. 
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Extended Data Figure 7 | Pre-stack depth migration tests. a, The two 
azimuths used for 3D Kirchoff-sum, pre-stack depth migration” are in-line 
with the seismic line and at right angles to it (the cross-line). The velocity 
model used was an earlier version of that shown in Extended Data Fig. 5b with 
slightly lower velocities. The input shot gathers for 3D migration were the 
instantaneous amplitude-attribute traces, heavily smoothed by median filtering 
as in Fig. 3b, then bandpass filtered to produce an approximate zero-phase, 
low-frequency wavelet at each pre-stack reflection with strong amplitude. 

b, The in-line migration, showing coherency in the top 30 km with the dip 

of the top of the plate evident. Here constant travel-time diffraction arcs are 
created from each shot point. Where energy is enhanced we interpret there 
to be reflectors, and conversely if there is no enhancement, but just the 
diffraction arcs, we interpret this to be a lack of coherent structure. Coherent 
energy is seen at greater depths (90-100 km), which we ascribe to structure 
of the LAB. c, Here the data of each shot gather are randomized (jack-knife 


test'®), and put through the same migration. Comparing the randomized with 
the correct data migration shows what are artefacts of the migration geometry, 
and what is signal. This confirms that images between 90 and 100 km depth 
seen in b are real. d, Result of migration in the cross-line direction; no coherent 
alignments are seen. e, Jack-knife test for the cross-line migration. Note 

that e looks similar to d, suggesting there is no alignment of structure in the 
cross-line direction, and that the reflections we are imaging are not side-swipe 
from out of plane structures. f, Schematic plot showing the relationship 
between the normalized thickness of the acoustic impedance transition zone, 
d/d, and the relative (that is, normalized to the value for a vanishingly thin 
transition zone) reflection coefficient. Here d is the thickness of the zone, A is 
the seismic wavelength, and the reflection coefficient is normalized to that 
for a impedance contrast of zero thickness". Inset, schematic of the transition 
zone d between two layers of different velocities, v) and v>. 
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Extended Data Figure 8 | Gathers for shot 11 where the offsets are plotted 
in-line and cross-line. In a, the maximum offset from the shot is 70 km in a 
northwest to southeast azimuth (see Extended Data Fig. 7a). In b, the maximum 
offset of seismic receivers from shot 11, in a southwest to northeast azimuth, 
is 7.5 km. These are gathers of median filtered data (0.5 s by 41 traces), 
smoothed and passed through a 8-25 Hz bandpass filter, plotted in the 24-32 s 


Shot 11 


Gather of 878 traces on NE-SW azimuth (cross-line to SAHKE profile - 7.5 km long) 


range to bring out the R2 and R3 reflectors. Note the coherency in the in-line 
direction and lack of coherency in the cross-line gather. This further 
corroborates evidence from the pre-stack depth migration that the R2 and R3 
reflections come from a surface below the northwest-southeast striking 
SAHKE04 line and are not side-swipe from out of plane reflections. 
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Extended Data Figure 9 | Models. a, Model of the structure and seismic 
velocities just north of the SAHKE line based on receiver functions”*. Note the 
proposed subduction channel of accreted sediments, in line with more recent 
work”, and the interpreted 13° dip, which is in the 12°-15° range proposed 
in this study. b, A simple horizontal-layered model which is used as input for 
the synthetic wave equation modelling using software e3d** shown in c. This 
is a simplified approximation model for our observations beneath SAHKE04, 
although we ignore dip. c, Wave equation modelling” based on input 
modelling shown in b for an input P wave and vertical geophones. No gain 
control is applied. Strong primary reflections and multiples from the top of the 


Input = S-wave 
Phones = vertical 


plate and oceanic Moho are shown. In reality, the surface multiples are not that 
strong because of scattering from topography. The R2 and R3 reflections can be 
seen as being weak events compared to RO. d, Wave equation modelling” 
based on input modelling shown in b for a input S wave and vertical geophones. 
Here the S-wave reflections are more prominent and S-P converted phases 
of R2 and R3 are predicted at about 38 and 41s. Note that the S-P and S-S 
reflections have no energy at zero incidence angle and significant energy 

only for offsets >30 km. As we are only stacking data with maximum offsets 
of 50 km, we expect that some, but limited, S-P energy is contributing to 

the R4 phase of Fig. 3a. 
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Recoded organisms engineered to depend on 


synthetic amino acids 
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Genetically modified organisms (GMOs) are increasingly used in 
research and industrial systems to produce high-value pharmaceuti- 
cals, fuels and chemicals’. Genetic isolation and intrinsic biocontain- 
ment would provide essential biosafety measures to secure these closed 
systems and enable safe applications of GMOs in open systems”, 
which include bioremediation‘ and probiotics’. Although safeguards 
have been designed to control cell growth by essential gene regulation®, 
inducible toxin switches’ and engineered auxotrophies*, these ap- 
proaches are compromised by cross-feeding of essential metabolites, 
leaked expression of essential genes, or genetic mutations””°. Here 
we describe the construction ofa series of genomically recoded organ- 
isms (GROs)" whose growth is restricted by the expression of mul- 
tiple essential genes that depend on exogenously supplied synthetic 
amino acids (sAAs). We introduced a Methanocaldococcus jannaschii 
tRNA:aminoacyl-tRNA synthetase pair into the chromosome of a 
GRO derived from Escherichia coli that lacks all TAG codons and 
release factor 1, endowing this organism with the orthogonal trans- 
lational components to convert TAG into a dedicated sense codon for 
sAAs. Using multiplex automated genome engineering”, we intro- 
duced in-frame TAG codons into 22 essential genes, linking their 
expression to the incorporation of synthetic phenylalanine-derived 
amino acids. Of the 60 sAA-dependent variants isolated, a notable 
strain harbouring three TAG codons in conserved functional residues” 
of MurG, DnaA and SerS and containing targeted tRNA deletions 
maintained robust growth and exhibited undetectable escape fre- 
quencies upon culturing ~10"' cells on solid media for 7 days or in 
liquid media for 20 days. This is a significant improvement over exist- 
ing biocontainment approaches~** °. We constructed synthetic auxo- 
trophs dependent on sAAs that were not rescued by cross-feeding 
in environmental growth assays. These auxotrophic GROs possess 
alternative genetic codes that impart genetic isolation by impeding 
horizontal gene transfer" and now depend on the use of synthetic 
biochemical building blocks, advancing orthogonal barriers between 
engineered organisms and the environment. 

The advent of recombinant DNA technologies in the 1970s estab- 
lished genetic cloning methods", ushering in the era of biotechnology. 
Over the past decade, synthetic biology has fuelled the emergence of 
GMOs with increased sophistication as common and valued solutions 
in clinical, industrial and environmental settings’*”, necessitating the 
development of safety and security measures first outlined in the 1975 
Asilomar conference on recombinant DNA”. While guidelines for phys- 
ical containment and safe use of organisms have been widely adopted, 
intrinsic biocontainment—biological barriers limiting the spread and 
survival of microorganisms in natural environments—remains a defin- 
ing challenge. Existing biocontainment strategies employ natural auxo- 
trophies or conditional suicide switches where top safeguards meet the 
10° * NIH standard (http://osp.od.nih.gov/office-biotechnology-activities/ 
biosafety/nih-guidelines) for escape frequencies (that is, one escape mutant 


per 10° cells), but can be compromised by metabolic cross-feeding or 
genetic mutation’”°. We hypothesized that engineering dependencies 
on synthetic biochemical building blocks would enhance existing con- 
tainment strategies by establishing orthogonal barriers not feasible in 
organisms with a standard genetic code. 

Our approach to engineering biocontainment used a GRO lacking all 
instances of the TAG codon and release factor 1 (terminates translation 
at UAA and UAG), eliminating termination of translation at UAG and 
endowing the organism with increased viral resistance, a common form 
of horizontal gene transfer (HGT). The TAG codon was then converted 
to a sense codon through the introduction of an orthogonal translation 
system (OTS) containing an aminoacyl-tRNA synthetase (aaRS):tRNA 
pair, permitting site-specific incorporation of sAAs into proteins with- 
out impairing cellular fitness'’. Leveraging these unique properties of 
the GRO, we sought to reintroduce the TAG codon into essential genes 
to restrict growth to defined media containing sAAs. We also eliminated 
the use of multi-copy plasmids, which reduce viability and growth", 
impose biosynthetic burden, persist poorly in host cells over time’’, and 
increase the risk of acquiring genetic escape mutants’, by manipulating 
native chromosomal essential genes and integrating the OTS into the 
genome. To engineer synthetic auxotrophies, we chose essential genes 
of varying expression levels (Methods), many of whose functions (for 
example, replication or translation) cannot be complemented by cross- 
feeding of metabolites. Genes dispersed throughout the genome were 
selected to prevent a single HGT event from compromising containment. 

We pursued three strategies to engineer dependence on non-toxic, 
membrane-permeable, and well-characterized sA As through the intro- 
duction of TAG codons into essential genes: (1) insertion at the amino 
terminus; (2) substitution of residues with computationally predicted 
tolerances'’; and (3) substitution of conserved’ residues at functional 
sites (Fig. 1a). We initially pursued the first two strategies ina GRO con- 
taining an OTS optimized for the sAA p-acetyl-L-phenylalanine (pAcF, 
ot; see Methods for a detailed explanation of the nomenclature used). 
Using multiplex automated genome engineering (MAGE)”, we tar- 
geted 155 codons for TAG incorporation via 4 pools of oligonucleotides 
(Supplementary Tables 1 and 2) in permissive media containing pAcF 
and L-arabinose (aaRS induction) (Fig. 1b). After replica plating on non- 
permissive media lacking pAcF and L-arabinose, we isolated eight pAcF 
auxotrophs with one strain containing two TAGs in essential genes 
(Fig. lcand Supplementary Table 3). To determine whether our strategy 
was capable of creating synthetic auxotrophs dependent on other sAAs, 
MAGE was used to mutagenize annotated residues in the sAA binding 
pocket of the pAcF aaRS (Supplementary Table 4) to accommodate 
p-iodo-L-phenylalanine (pIF, B) or p-azido-L-phenyalanine (pAzF, y) 
in two strains. After MAGE-based incorporation of TAGs and selec- 
tions on permissive and non-permissive solid media, we obtained 8 pIF 
and 23 pAzF auxotrophs harbouring 1-4 TAGs at 30 distinct loci across 
20 essential genes (Supplementary Tables 3 and 5). Together, these data 
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Figure 1 | Strategy used to engineer GROs to depend on sAAs for growth. 
a, Approaches used to identify suitable loci within essential proteins for sAA 
(blue) incorporation. b, MAGE was used for site-specific incorporation of 
TAG codons into essential genes of a GRO lacking all natural TAG codons 
(ATAG) and release factor 1 (AprfA), and containing an OTS (green) consisting 
of the M. jannaschii aaRS and cognate UAG-decoding tRNA. c, Synthetic 
auxotrophs that depend on sAAs for growth were isolated. 


demonstrate the modularity of our approach and that synthetic auxo- 
trophs can be engineered across many essential genes using multiple 
sAAs (Extended Data Fig. 1). 

Measurements of doubling time in permissive media revealed min- 
imal or no fitness impairment of synthetic auxotrophs relative to their 
non-contained ancestors with a genomically integrated OTS (Fig. 2a 
and Supplementary Table 6). To quantify the degree of containment, we 
measured the ratio of colony-forming units (c.f.u.) on non-permissive 
to permissive solid media and observed a range of escape frequencies 
spanning 10° * to 10” (Fig. 2b). One notable strain DnaX.Y 113 pre- 
served the doubling time of its non-contained ancestor (Fig. 2a) while 
maintaining an escape frequency of 6.7 X 10°” (Fig. 2b). We directly 
investigated pAcF incorporation in DnaX.Y1130 using mass spectro- 
metry and identified peptides containing pAcF at Y113 (Fig. 2c). 

To investigate escape mechanisms of escape mutants derived from 
synthetic auxotrophs with one essential TAG codon, we performed tar- 
geted sequencing and observed transition mutations (AeT to GeC and 
GeC to AeT) commonly observed in mismatch-repair-deficient strains 
(AmutS)”*. Allisolated DnaX.Y113« escape mutants incorporate trypto- 
phan by mutation of the TAG codon to TGG. SecY.Y1220 escape mutants 
incorporate glutamine by mutation of gin V to form a glutamine amber 
suppressor or mutation of the secY.Y122 TAG codon to CAG (Sup- 
plementary Table 7). One of three SecY.Y122% escape mutants was 
wild type at the secY. Y122 TAG codon and putative amber-suppressor 
loci®, but whole-genome sequencing (Supplementary Table 8) revealed 
a Q54D missense mutation in rpsD (30S ribosomal subunit S4). This site 
is implicated in ribosome fidelity” and is the causal mutation leading 
to escape in this mutant (Extended Data Fig. 2). 

These escape mechanisms informed two sets of experiments to engi- 
neer strains with lower escape frequencies. First, we sought to create 
synthetic auxotrophs with an increased numbers of TAGs (Fig. 2d) by 
combining TAGs from strains possessing the lowest escape frequencies 
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Figure 2 | Characterization of strains dependent on sAA incorporation in 
essential proteins. a, Doubling time ratios for the non-contained ancestor to 
synthetic auxotroph containing one TAG. b, Escape frequencies of strains 
from a. c, Superimposed MS/MS spectra for DnaX peptides from DnaX.Y113a 
(red) and the non-contained ancestor rEc.o (blue). Overlapping peaks are 
purple and a mass shift relative to rEc.o identifies Y113 as the pAcF 
incorporation site in DnaX.Y113a; see Methods. d, e, Escape frequencies for 
strains with multiple TAG codons (d) and/or functional mismatch repair (e) 
(prime, mutS*). For all plots, average values of three technical replicates are 
plotted with error bars representing +s.d. Reported results repeated at least 
three times in independent experiments. 


(that is, dnaX. Y113, lspA.Y54 and secY. Y122) into a single strain. In strains 
containing two TAGs, the escape frequency was reduced to 1.4 X 107” 
(rEc.y.dB.26) and 1.4 X 107° (rEc.B.dB.9) (strain annotations are listed 
in Supplementary Table 6 and a complete description of our nomen- 
clature can be found in the Methods). In strains containing three TAGs, 
escape frequencies were further reduced to 5.0 X 10° (rEc.B.dC.11) and 
4.7 X 10” (rEc.B.dC.12). We used MAGE to assess quantitatively the 
effects of non-synonymous mutations at individual TAG codons in 
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strains incorporating pIF at SecY.Y122, DnaX.Y113, and LspA.Y54 by 
mutating the TAG site to sense codons for all 20 natural amino acids. 
Strains containing multiple TAGs were less likely to survive when one 
TAG was compromised (Extended Data Fig. 3). Ina second set of exper- 
iments, we restored mutS (prime symbol (') denotes a mutS™ strain) 
and observed a decreased escape frequency in strains by 1.5- to 3.5- 
fold (Fig. 2e and Supplementary Table 6). Escape mutants derived 
from mutS* higher-order TAG strains exhibited impaired fitness with 
1.14- to 1.28-fold greater doubling times than their contained ances- 
tors. Whole-genome sequencing was performed on these escape mu- 
tants and revealed mutations of tyrosine tRNAs to form tyrosine amber 
(UAG) or ochre (UAA) suppressors (Supplementary Table 9). 

To reduce escape frequencies below ~ 10° and eliminate rescue by 
natural amino acids, we pursued a third strategy to replace conserved 
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and functional residues in essential proteins with sAAs (Fig. 1a). Using 
the Conserved Domain Database””’, we searched all essential proteins for 
tyrosine, tryptophan and phenylalanine residues involved in protein— 
protein interactions (for example, dimerization) or located within active 
sites to identify candidates suitable for replacement with phenylalanine- 
derived sAAs. After targeted insertion of TAG codons using MAGE, we 
isolated four synthetic auxotrophs with pAzF incorporated at GlyQ.Y226 
(glycyl-tRNA synthetase o subunit, dimer interface), Lnt.Y388 (apolipo- 
protein N-acyltransferase, active site), MurG.F243 (N-acetylglucosaminyl 
transferase, active site), and DnaA.W6 (chromosomal replication initi- 
ator protein, oligomerization site”*) in strains with minor fitness impair- 
ments (Fig. 3a) and escape frequencies spanning 10 °to1l0 ” (Fig. 3b). 
Identical experiments to incorporate pAcF and pIF failed to generate syn- 
thetic auxotrophs, suggesting that the targeted residues are recalcitrant 
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Figure 3 | Characterization of strains dependent on sAA incorporation at 
active and dimerization sites in essential proteins. a, Doubling time ratios for 
the non-contained ancestor to pAzF auxotroph with one or more TAGs at 
functional loci calculated from growth in 5 mM pA2F and 0.2% L-arabinose. 
b, Escape frequencies of strains in a; bars represent escape frequencies below 
the detection limit; average escape frequencies are plotted. c, Representative 
assay surveying tolerance of TAG loci to 20 amino acids in different synthetic 
auxotrophs and expressed as log of total cell survival. A + symbol indicates a 
TAG codon at the locus in the background strain; — indicates the wild-type 
codon; blue and yellow indicate high and low tolerance to substitution, 
respectively; see Methods. d, Representative escape assay monitoring escape 


L-arabinose concentration (%) 


frequencies up to 7 days after plating on solid non-permissive media; hollow 
symbols/dashed lines, no observed escape mutants; see Methods. e, Temporal 
monitoring of permissive (P, blue) and non-permissive (NP, red) cultures 
inoculated with ~10°, 10'° or 10"! cells of rEc.y.dC.46’ .AtY by ODeo0. 

f, Associated c.f-u. from e as sampled on permissive (solid lines) or non- 
permissive (dashed lines) solid media; c.f.u. were never observed on non- 
permissive solid media; hollow data points indicate no observed c.f.u. 

g, Maximum ODgop values during growth in LB media across a concentration 
gradient of pAzF and L-arabinose. For all plots, average values of three technical 
replicates are plotted with error bars representing +s.d. Reported results 
repeated at least three times in independent experiments. 
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to replacement by pAcF and pIF. Since pAzF was able to replace con- 
served and functional tyrosine, phenylalanine and tryptophan residues 
across several essential proteins, we hypothesized that engineering 
strains to contain higher-order TAG combinations would limit escape 
by mutations that cause incorporation of natural amino acids at mul- 
tiple TAG codons. Escape frequencies of 1.6 X 10°’ and 2.3 X 10° were 
observed for strains containing two TAG codons: rEc.y.dB.41 (DnaA.W6 
and MurG.F243) and rEc.y.dB.43 (DnaA.W6 and SerS.F213), respec- 
tively. Upon restoring mutS, the escape frequency of rEc.y.dB.41' fell to 
6.010 1° (Fig. 3b). Merging all three sites into one strain (rEc.y.dC.46) 
and its mutS* derivative (rEc.y.dC.46') led to escape frequencies of 
<7.9 X 107" and <4.4 X 107"! (below the detection limit of our plate- 
based assay), respectively (Supplementary Table 6). 

Temporal monitoring of rEc.y.dC.46’ revealed the emergence of 
growth-impaired escape mutants 2 days post-plating on non-permissive 
solid media (Fig. 3d). Sequencing of escape mutants derived from strains 
rEc.y.dC.41’ and rEc.y.dC.46’ revealed amber-suppressor-forming muta- 
tions at one of three tyrosine tRNAs (tyrT, tyrV, tyrU) with growth 
impairments spanning 1.61- to 2.10-fold increases in doubling time 
relative to contained ancestors (Supplementary Tables 9 and 10). Given 
that E. coli contains three tyrosine tRNAs, we hypothesized that dele- 
tion of tyrT and tyrV** would prevent acquisition of amber-suppressor- 
forming mutations at tyrU, as preservation of this single remaining copy 
of tRNA’ ” would be required to maintain fidelity of tyrosine incor- 
poration during protein synthesis. We used A-Red recombination to 
delete tyrT and tyrV in rEc.B.dC.12’, rEc.B.dC.12’.E7 (escape mutant 
of rEc.B.dC.12’), and rEc.y.dC.46’ with a chloramphenicol resistance 
gene. Deletion of tyrT and tyrV restored containment of the escape 
mutant, establishing the causal escape mechanism (Extended Data Fig. 4). 
Moreover, tyrT/V deletions in rEc.§.dC.12' .AtY and rEc.y.dC.46’.AtY 
decreased escape frequencies below detectable levels (<4.9 X 10° '” and 
<6.3 X 10", respectively) over the 7-day observation period (Fig. 3d 
and Supplementary Table 11). 

To challenge strains rEc.B.dC.12’.AtY and rEc.y.dC.46’.AtY with 
natural amino acids and mimic a potential HGT event, we introduced 
constructs containing phenylalanine or tryptophan amber-suppressor 
tRNAs. While growth of suppressor-containing strains was equivalent 
to the cognate-contained ancestor in permissive liquid media, severely 
impaired growth or no growth was observed in non-permissive media 
(Extended Data Fig. 5). Such findings are further supported in experi- 
ments where a large (~10"') inoculum of cells challenged on solid or in 
liquid (see below) non-permissive media do not yield escape mutants, 
providing ample opportunity for natural formation ofa phenylalanine 
amber suppressor via mutation of one of two native copies of RNA’™. 
These data support our hypothesis that synthetic auxotrophs contain- 
ing higher-order TAG combinations depend on the sAA and limit growth 
from natural amino acids. 

To interrogate the long-term stability of synthetic auxotrophs where 
escape mutant formation is not limited by a colony growth environment, 
temporal monitoring of rEc.y.dC.46'.AtY was performed on large cell 
populations in liquid culture (1 litre of Luria-Bertani (LB) media) for 
7 days with frequent OD¢o9 measurements to track cell growth (Fig. 3e). 
Inoculation of ~10"' cells in permissive media led to a confluent culture 
of contained cells within 24 h. Inoculation of ~10°, ~10!° and ~101! 
cells into non-permissive media revealed transient growth, which we 
propose is due to residual pAzF and L-arabinose from large inoculums, 
followed by a sustained decrease in cell density and growth termina- 
tion. Cell survival and escape from liquid cultures was monitored by 
quantifying c.f.u. on permissive and non-permissive solid media, respec- 
tively (Fig. 3f). Plating on permissive solid media revealed a drop in c.f.u. 
to below the limit of detection within 1 day from the non-permissive 
flask inoculated with ~ 10” cells and 3 days from non-permissive flasks 
inoculated with ~10'° or ~10"" cells (Fig. 3f). No c.f.u. were observed 
from any culture plated on non-permissive solid media. To confirm the 
absence ofa single escape mutant following an extended 20-day growth 
period (Extended Data Fig. 6), the non-permissive and permissive cultures 
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inoculated with ~10'' cells were plated across 30 non-permissive plates. 
Escape mutants were not observed and escape frequencies remained 
below the detection limit after 7 days, which is comparable to the solid 
media results. These results demonstrate that rEc.y.dC.46’ .AtY depends 
on pAZF, maintains long-term stability of biocontainment in permissive 
liquid media and exhibits termination of growth in non-permissive media. 

To further investigate the dependency on sAAs, liquid growth profiles 
were collected for synthetic auxotrophs across sAA and L-arabinose 
concentration gradients. Growth of rEc.y.dC.46’.AtY was not observed 
below 0.002% L-arabinose and 0.5 mM pAzF (Fig. 3g and Extended Data 
Fig. 7). Growth increased in a dose-dependent manner with increasing 
concentrations of pAzF and L-arabinose, where 5 mM pAzF and 0.2% 
L-arabinose was optimal for fitness (that is, maximum OD¢o9 and mini- 
mum doubling time). In an equivalent experiment with rEc.B.dC.12’.Aty, 
1 mM pIF and 0.2% L-arabinose was optimal for fitness (Extended Data 
Fig. 8). Since growth was not observed in media lacking either L-arabinose 
or the sAA, these data further support the dependency of synthetic 
auxotrophs on sAAs. 

To determine whether a synthetic auxotroph could be rescued. by 
metabolic cross-feeding, we evaluated the viability of strains on diverse 
media types. We grew wild-type MG1655 Escherichia coli, a biotin auxo- 
troph (EcNR2”), a non-contained GRO (rEc.y), and the pAzF syn- 
thetic auxotroph (rEc.y.dC.46’) on solid media containing both pAzF/ 
L-arabinose and biotin, either pAzF/L-arabinose or biotin, and on plates 
lacking small molecules (Fig. 4). Despite biotin auxotrophy, growth of 
EcNR2 on rich defined media without biotin was rescued in close prox- 
imity to wild-type E. coli, suggesting cross-feeding of essential metabo- 
lites (Extended Data Fig. 9). Blood agar and soil extracts without biotin 
or pAzF/L-arabinose supplementation supported growth of all strains 
except the synthetic auxotroph, which only grew on media supplemented 
with pAZF and L-arabinose. These data suggest that synthetic auxotro- 
phies could lead to a more viable containment strategy for clinical (for 
example, blood) and environmental (for example, soil) settings, where 
metabolic auxotrophies can be overcome by proximal, metabolically 
competent strains. 

Synthetic auxotrophs utilize unnatural biochemical building blocks 
necessary for essential proteins with activities that cannot be comple- 
mented by naturally occurring small molecules. We have previously 
shown that genomic recoding interferes with HGT from viruses", and 
have now extended orthogonal barriers by engineering two synthetic 
auxotrophs using two unique sAAs that exhibit escape frequencies below 
our detection limit (<6.3 X 107 '”). These synthetic auxotrophs possess 
three essential TAGs at loci dispersed throughout the genome (0.84, 
0.86 and 2.9 megabases apart), thereby limiting the likelihood that a 
single HGT event could compromise containment (Extended Data 
Fig. 1). These orthogonal barriers can be expanded further by incorpo- 
ration of additional TAG sense codons across more than three essential 
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Figure 4 | Investigating the viability of synthetic auxotrophs on diverse 
media types. Rescue by cross-feeding shown through spotting on diverse 
media types with or without pAzF/L-arabinose (pAzF + ara) and biotin 
supplementation; EcNR2, rEc.y, and rEc.y.dC.46’ are auxotrophic for biotin 
(AbioA/B) and rEc.y.dC.46' is also a pAzF auxotroph. 
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genes, but will probably require concurrent advances in OTS perfor- 
mance to maintain fitness and viability (for example, enhanced activity 
and specificity of aaRS:tRNA pairs”’). Our modular approach to bio- 
containment limits growth to synthetic environments containing unnat- 
ural biochemical building blocks with diverse chemistries. We anticipate 
that further genome recoding efforts''”°”’ will enable auxotrophies for 
multiple sAAs that could be enhanced by other synthetic components 
including unnatural nucleotides and extended genetic alphabets*”°. 
Orthogonal biological systems employing multi-level containment mech- 
anisms are uniquely suited to provide safe GMOs for clinical, environ- 
mental and industrial applications’. 

Despite the breadth of genomic diversity found in nature, all species 
utilize the same biochemical foundation to sustain life. The semantic 
architecture of the GRO employs orthogonal translational components, 
establishing the basis for a synthetic molecular language that relieves 
limitations on natural biological functions by depending on the incor- 
poration of sAAs with exotic chemistries. This work sets the stage for 
future experiments to probe the optimality of the natural genetic code 
and to explore the plasticity of proteins and whole organisms capable 
of sampling new evolutionary landscapes. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Reagents. Oligonucleotide synthesis was performed by Integrated DNA Technol- 
ogies (IDT) and Keck Foundation Biotechnology Resource Laboratory at Yale Uni- 
versity (Supplementary Table 1). Unless otherwise stated, all cultures were grown 
in LB media. The following selective agents and inducers were used at the specified 
concentrations: ampicillin (amp, 50 ug ml~ 1) carbenicillin (carb, 50 ug ml 1) ze0- 
cin (zeo, 10 1g ml~'), spectinomycin (spec, 95 1g ml ') and sodium dodecyl sul- 
phate (SDS, 0.005% w/v), isopropyl--b-1-thiogalactopyranoside (IPTG, 100 1M), 
5-bromo-4-chloro-3-indolyl-B-p-galacto-pyranoside (X-Gal, 401g ml~'), and 
L-arabinose (ara, 0.2% w/v unless otherwise indicated). sAAs were used at 1 mM 
unless otherwise indicated and purchased from PepTech (pAcF, AL624-2), BaChem 
(pIF, F-3075.0005) and Chem-Impex International (pAzF, 03376). 

Plasmids. All tRNAs used to assess tolerance for tryptophan and phenylalanine at 
TAG codons were contained within the pTech plasmid backbone and driven by 
the [pp promoter*. Isothermal assembly” was used to replace the chloramphenicol 
acetyltransferase (cat) gene with the sh ble gene for resistance to zeocin. 

The supU amber suppressor tRNA* was used to assess tolerance for tryptophan 
anda phenylalanine amber suppressor”! was used to assess tolerance for phenylal- 
anine. pTech-supU was provided by the laboratory of D. Séll and supPhe was syn- 
thesized by IDT and isothermally assembled into the pTech plasmid backbone to 
obtain pTech-supPhe. 

Conversion of aminoacyl-tRNA synthetase specificity. The pAcF OTS was inte- 
grated into the genome of the GRO linked to a counter-selectable gene tolC. Co- 
selection multiplex automated genome engineering (CoS-MAGE™) was used as 
described previously to introduce annotated mutations” to the sAA binding pocket 
of the aaRS for specificity towards pAzF or pIF (Supplementary Table 4). Sanger 
sequencing was used to verify these mutations. Conversion of sAA-specificity was 
assessed in sequence-verified clones upon growth in the presence of sAA incorpo- 
ration and episomally-expressed GFP containing an in-frame TAG codon at resi- 
due 151 within the protein product. OTS-mediated suppression of this codon with 
the sAA (that is, pAzF, pIF) generated a full-length fluorescent product, indicating 
that sAA incorporation had occurred and specificity was achieved. 

TAG codon incorporation into essential genes. We applied three unique strat- 
egies to identify permissive sites in essential genes for TAG codon incorporation 
(Fig. 1). In our first strategy, a subset of essential genes** were chosen for the incor- 
poration of one or more TAG codons immediately after the start codon to encode 
a sAA at the amino terminus. To explore a diverse library of incorporation targets 
within the E. coli proteome, in our second strategy we applied the sorting intolerant 
from tolerant (SIFT) algorithm'* (downloaded on the Yale Biomedical High Per- 
formance Computing Cluster, http://sift.jcvi.org/) to the entire panel of essential 
E. coli proteins**. SIFT is an algorithm that uses sequence homology to predict the 
tolerance of amino acid substitutions at different indices. In our workflow, genes 
were first split into three categories on the basis of wild-type expression level*’, and 
a further four subgroups by genomic location with the goal of targeting essential 
genes dispersed throughout the E. coli chromosome. Next, genes shown to be essen- 
tial by multiple studies** were passed through SIFT. For each essential gene, two 
high, medium, and low tolerance sites were targeted for TAG incorporation by 
MAGE. By this approach, we were able to sample diverse residue types in proteins 
with varying wild-type expression levels. 

In our third strategy, using the conserved domain database”’, we searched within 
all annotated essential proteins for tyrosine, tryptophan and phenylalanine resi- 
dues predicted to participate in essential enzymatic reactions or protein-protein 
interactions (e.g., dimerization). To minimize the probability that the added func- 
tionality of the sAA would perturb protein function, we targeted sites that were 
observed to occur as tyrosine or tryptophan in different homologues. 

GROs containing an OTS integrated into the chromosome were grown to mid- 

log phase in liquid permissive LB media and four cycles of MAGE were performed 
per pool of mutagenic oligonucleotides (oligonucleotide concentration = 15 1M) 
as described previously’*”*. To isolate synthetic auxotrophs, mutagenized cultures 
were plated on solid media and replica plated onto non-permissive media. To iden- 
tify TAG incorporation loci, multiplex allele-specific colony (MASC) PCR was used 
to interrogate pools of up to eleven targeted loci as previously described”, followed 
by verification using Sanger sequencing. 
Genotyping. Sanger sequencing was performed by the Keck DNA Sequencing 
Facility at Yale University or by GENEWIZ, Inc. Genomic DNA for whole genome 
sequencing was prepared using a Qiagen Genomic DNA purification kit. lumina 
libraries were prepared by the Yale Center for Genomic Analysis or the Dana 
Farber Cancer Institute. Illumina HiSeq or MiSeq sequencing systems were used for 
whole genome sequencing to generate 50- or 150-base-pair (bp) paired-end reads, 
respectively. 

Whole-genome sequencing was used to analyse three escape mutants per back- 
ground. In all cases, the direct ancestor to the escape mutant was also analysed. SNPs 
in escape mutants were identified relative to the reference genome E. coli C321.AA 


(CP006698.1, GI:54981157) using a previously described" software pipeline. SNPs 
listed in Supplementary Table 8 and Supplementary Table 10 were called by Free- 
bayes in escape mutants. 

Strains. All GROs used in this study are derived from C321.AA (CP006698.1, 
GI:54981157)** which lacks all TAG codons and release factor 1. This strain is derived 
from strain ECNR2 (AmutS:catA(ybhB-bioAB):[cI857A (cro-ea59):tetR-bla]), modi- 
fied from E. coli K-12 substr. MG1655. Inall synthetic auxotrophs, the M. jannaschii- 
derived OTS was genomically integrated into the GRO fused to the counter-selectable 
gene tolC. The OTS consists of an L-arabinose-inducible aaRS driven by the araBAD 
promoter, anda constitutively expressed cognate amber-decoding tRNA driven by 
the proK promoter. All genome modifications that required incorporation of dsDNA 
(for example, modifications to the mutS gene or incorporation of antibiotic select- 
able markers) were performed via -Red recombination”. 

Nomenclature of genomically recoded organisms and synthetic auxotrophs. 
To succinctly name strains, we have introduced a new one-letter amino acid code for 
sAAs using Greek lettering (pAcF = o, pIF = B, and pAzF = y). Non-contained 
GROs lacking essential TAG codons are named according to the one letter sAA code 
for the specific OTS present in the organism. For example, a ATAG GRO with a 
genomically integrated pAcF OTS is rEc.c. 

Biocontained GROs containing essential TAG codons are named according to 
two conventions based on the number of essential TAG codons in the auxotroph: 
(1) Strains with one essential TAG are named by the essential protein containing the 
sAA and the position and identity of the residue substituted therein (for example, 
a strain containing pAcF at residue 113 in DnaX is DnaX.Y113a); (2) Strains con- 
taining more than one essential TAG are named using the one letter sAA code for 
which the organism is auxotrophic. This is followed by a dependency code, d, indi- 
cating the presence of two (dB), three (dC) or four (dD) essential TAG codons, and 
then by a TAG combination number that uniquely identifies the specific combina- 
tion of TAGs in the strain. Combinations are numbered from one through 46 and 
are listed in Supplementary Table 6. 

Mismatch repair. The presence of a prime symbol (") following the TAG com- 
bination number indicates that mutS has been restored at its native locus, impart- 
ing functional mismatch repair to the organism. 

tRNA redundancy. Following the TAG combination number, At indicates the 
amino acid for which tRNA redundancy has been eliminated and is followed by 
the relevant amino acid (for example, a strain in which two of three total tyrosine 
tRNAs were deleted is AtY). 

Escape mutant identity. At least three escape mutants were characterized per strain 
background that permitted an escape mutant. An escape mutant is designated by a 
number following the letter ‘E’ (for example, E1). 

The summary of synthetic auxotrophs generated in this study illustrated in 
Extended Data Fig. 1 was constructed using the Circos” software. 

Strains were grown at 34°C in flat-bottomed 96-well plates containing 150 WL of 
LB media permissive for sAA incorporation, unless otherwise indicated. Strains were 
washed twice with sterile dH2O before assessing growth in non-permissive media. 
Kinetic growth (OD¢o9) was monitored on a BioTek plate reader at ten-minute 
intervals in triplicate. Raw OD¢oo data from the plate reader were normalized to 
standard absorbance (ODgq9 at 1 cm path length) values using an empirically derived 
calibration curve (Y = 1.9704x — 0.1183, where y = ODgoo at 1 cm path length and 
x = ODgo0 from plate reader; R? = 0.998). DTs were calculated in MATLAB using 
custom code. Reported values are the average between three technical replicates 
where error bars represent + s.d. All reported results repeated at least three times 
in independent experiments. Maximum ODgoo values were obtained after 24 h 
of growth and represent the average of three technical replicates. Reported results 
repeated at least three times in independent experiments. 

Mass spectrometry. Histidine-tagged proteins were purified on NiNTA resin 
(Qiagen). Resolution of purity was assessed via SDS-PAGE. In-gel digestion was 
performed similarly to previously described methods”. Proteins were stained and 
imaged within the gel using Coomassie blue (R-250). A band corresponding to the 
molecular weight of DnaX was excised. Gel slices were processed into 1-mm cubes, 
washed in 1:1 (v/v) 50% CH3CN/50 mM NH4HCOs, and then washed in 1:1 (v/v) 
50% CH3;CN/10 mM NH,CO3. 13.33 ng ul? trypsin solution in 9:1 (v/v) 50 mM 
NH,CO;/50% CH3CN was added and samples were incubated overnight at 37 °C. 
Peptides were extracted with 1:2 (v/v) 5% formic acid/50% CH3CN and dried. Pep- 
tides were desalted by reconstitution in 3:8 (v/v) 70% formic acid/0.1% TFA, fol- 
lowed by loading onto a custom-made stage tip (2 X 1.06 mm punches of Empore 
C18 extraction disks [3 M] ina 200 ul pipette tip)*! activated with 80% CH3CN and 
0.1% TFA. Tips were washed twice with 0.1% TFA and peptides eluted with 80% 
CH;CN and 0.1% TFA. Peptides were dried and reconstituted for LC/MS/MS anal- 
ysis. Capillary LC/MS/MS was carried out using an LTQ Orbitrap Velos (Thermo 
Scientific) with a nanoAcquity uHPLC (Waters) system as described previously”. 
The data were processed as described previously’. MASCOT scores were above 
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the identity or extensive homology threshold and representative spectra are illu- 
strated in Fig. 2c. 

Quantitative assessment of amino acid tolerance. The following workflow was 
performed to assess the tolerance for natural amino acid substitution at residues 
chosen for sAA incorporation. Strains were grown to mid-log phase in 1 ml of per- 
missive LB media, and MAGE was performed as described’* with modifications 
described here. Post induction of \-Red proteins, cells were transferred to indivi- 
dual wells of a 96-well, V-bottomed plate, and washed twice at 4°C with sterile 
deionized water (dH,0O). Cells were re-suspended in 50 pil of water or 1 1M muta- 
genic single-stranded DNA to convert a single in-frame essential TAG codon to 
one of 20 sense codons, and electroporated in a 96-well plate. Cells were electro- 
porated using the Harvard BTX electroporation system (2.4kV, 750Q, 25 L1F). 
Electroporated cells were recovered in 1.5 ml of fresh permissive media in a 96-well 
plate for 4h at 34°C shaking. Cells were pelleted, washed twice with sterile dH20, 
and re-suspended in 200 il of 1X PBS. Serial dilutions were made in 1X PBS and 
50 pl each of non-diluted and 100-fold diluted samples were plated on solid non- 
permissive LB media. 50 il each of higher dilutions were plated on permissive solid 
media and all plates were incubated for 20 h at 34 °C. 

Colony-forming units counted on non-permissive media were expressed as a 
ratio of total c.f.u. on permissive media. Since the frequency of MAGE-mediated 
recombination (~0.3)'? exceeds the escape frequencies of these background strains 
(S10 °), we directly correlated these ratios to amino acid tolerance. MATLAB was 
used to calculate the logy of this ratio. Where no c.f.u. were observed on non- 
permissive media, indicative of a highly intolerant substitution, a ratio could not be 
calculated and these values were defaulted to NaN within MATLAB. A heat map 
was used to compare representative data for one experiment, where blue indicates a 
tolerated substitution and yellow, a non-tolerated substitution. 

Twenty-one separate MAGE experiments were performed as described above 
for each strain, per essential genomic TAG, to assess tolerance for each of the 20 
natural amino acids at each TAG site, plus a negative control (water). Strains with 
one TAG codon (SecY.Y122, DnaX.Y113B, LspA.Y54B, DnaA.W6y, SerS.F213y, 
and MurG.F243y7) were assessed across 21 (including the negative control) experi- 
ments per strain, strains with two TAG codons (rEc.B.dB.9) were assessed across 
42 (including two negative control) experiments per strain, and strains with three 
TAG codons (rEc.B.dC.12 and rEc.y.dC.46) were assessed across 63 (including 
three negative control) experiments per strain. Reported results repeated at least 
three times in independent experiments. 

Escape assays. Strains were grown in triplicate to late-log phase in 2 ml of permissive 
LB media, pelleted, washed twice with sterile dH2O, and re-suspended in 200 ll 
1X PBS. To obtain total and escape mutant c.f.u., serial dilutions were made and 
equal volumes were plated on permissive and non-permissive solid media plates 
(100 X 15 mm). Plates were incubated at 34 °C and escape frequency was calculated 
as the total number of escape mutant c.f.u. observed per total cells plated. Reported 
escape frequencies are means of three technical replicates where error bars rep- 
resent +s.d. To isolate escape mutants in strains with lower escape frequencies, 
~10'°-10"' cells were plated and the resulting escape frequency from a represen- 
tative escape assay is reported. When escape mutants were not detected upon plating 
~10'°-10"' cells, the escape frequency is described to be below the limit of detection 
and reported as less than a frequency of one over the total number of cells plated. 
In all cases, reported results repeated at least three times in independent experi- 
ments. Where temporal monitoring of escape frequencies on solid media is reported 
(Fig. 3d), representative escape assays are plotted and results repeated at least three 
times in independent experiments. 

Liquid escape assays. Long-term liquid growth was assessed for two strain back- 
grounds: the pAzF-dependent strain, rEc.y.dC.46’.AtY, and its non-contained 
ancestor, rEc.y. Growth of rEc.y.dC.46’.AtY was separately assessed in permissive 
(+sAA/+L-arabinose) and non-permissive (—sAA/—L-arabinose) media and growth 
of rEc.y was assessed in non-permissive media. In all cases, flasks contained car- 
benicillin to prevent contamination. 

Strains were grown in 100 ml of LB media overnight. Cultures were then pelleted 
and washed twice with the same volume of sterile dH,O. Washed pellets were re- 
suspended in LB media plus or minus small molecules and this slurry was then 
added to shake flasks containing 11 of LB media plus or minus small molecules, at 
time zero. At this first time point, a 1 ml sample was obtained from each flask, from 
which the ODgo9 was measured and 50 1l was plated on both permissive (+sAA/ 
+1-arabinose/+carbenicillin) and non-permissive (—sAA/—L-arabinose/+car- 
benicillin) solid LB media, in three technical replicates. Average c.f.u. counts are 
reported and error bars represent +s.d. In all cases, c.f.u. on solid media were counted 
after 24 h of incubation at 34 °C. OD¢00 nm readings and c.f.u. counts were collected 
in this manner for all subsequent time points for the following 20 days. 

After 20 days of growth in liquid media, the two 1-1 cultures of rEc.y.dC.46’.AtY 
grown in non-permissive and permissive media were interrogated for the presence 
of a single escape mutant. The entire culture was pelleted, re-suspended in 7 ml of 
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1X PBS, and plated across 30 large non-permissive solid media plates that were 
subsequently monitored for c.f-u. formation over the following 7-day period. All 
reported results repeated at least three times in independent experiments. 
Environmental challenges. Wild-type E. coli K-12 substr. MG1655, and additional 
E. coli strains ECNR2”, rEc.y.dC.46’ (a pAzF auxotroph), and rEc.y (non-contained 
ancestor to rEc.y.dC.46") were grown to mid-log phase in 1 ml of LB media sup- 
plemented with small molecules, where necessary. Cultures were washed three 
times with sterile dH2O and re-suspended in 1 ml 1X PBS. A total of 16, two-fold 
serial dilutions were made and spotted on the following solid media types: (1) LB, 
(2) EZ rich defined* (with modifications by Teknova) and containing 100 carbon 
source (40% glycerol), (3) blood agar (Teknova), or (4) soil extract agar (HiMedia). 
Prior to spotting, plates were topically supplemented with pAzF, L-arabinose, and/ 
or biotin, and dried for at least 1 h. Spotted plates were incubated for 1 day at 34 °C 
and photographed in a Gel Doc XR+ running ImageLab v4.0.1 (BioRad). 
Selectable markers used in this study. cat (1,015 bp) 
cctgtgacggaagatcacttcgcagaataaataaatcctgetetccctgttgataccgggaagccctggeccaacttttg 
gcgaaaatgagacettgatcgecacgtaagagettccaactttcaccataatgaaataagatcactaccegecetattt 
tttgagttgtcgagattttcaggagctaageaagctaaaatggagaaaaaaatcactggatataccaccettgatatatc 
ccaatggcatcgtaaagaacattttgagecatttcagtcagttgctcaatgtacctataaccagaccgttcagctggata 
ttacggcctttttaaagaccgtaaagaaaaataagcacaagttttatccggcctttattcacattcttgceccgcectgatga 
atgctcatccggaattacgtatggcaatgaaagacgstgagctgstgatatgeeatagtettcacccttgttacaccgtt 
ttccatgagcaaactgaaacgttttcatcgctctggagtgaataccacgacgatttccggcagtttctacacatatattcg 
caagatgtgecetettacgetgaaaacctggcctatttccctaaaggetttattgagaatatgtttttcgtctcagccaat 
ccctggetgagtttcaccagttttgatttaaacgtgeccaatatggacaacttcttcgcccccgttttcaccatgggcaaa 
tattatacgcaaggcgacaagetectgatgccgctgeceattcagettcatcatgccettigtgatggcttccatgicgg 
cagaatgcttaatgaattacaacagtactgcgatgagtescageecessocetaatttttttaagecagttattgetec 
ccttaaacgcctgettgctacgcctgaataagtgataataagcggatgaatgecagaaattcgaaagcaaattcgacc 
cggtcgtcgsttcagescaggetcettaaatagccgcttatgtctattgctgett 

kanR (1,165 bp) 
cctgtgacggaagatcacttcgcagaataaataaatcctgstetccctgtigataccgggaagccctggeccaacttttg 
gcgaaaatgagacettgatcggcacgtaagagettccaactttcaccataatgaaataagatcactaccggecetattt 
tttgagttgtcgagattttcaggagctaaggaagctaaaatgagccatattcaacgggaaacgtcgageccecgatta 
aattccaacatggatgctgatttatatgestataaatgeectcgcgataatgtcggecaatcagetgcgacaatctatcg 
cttgtatgggaagcccgatgcgccagagttgtttctgaaacatggcaaagetagcettgccaatgatgttacagatgag 
atggtcagactaaactggctgacggaatttatgcctcttccgaccatcaagcattttatccgtactcctgatgatgcatgg 
ttactcaccactgcgatccccggaaaaacagcattccaggtattagaagaatatcctgattcagetgaaaatattgttga 
tgcgctggcagtettcctgcgccgettgcattcgattcctgtttgtaattgtccttttaacagcgatcgcgtatttcgtctcg 
ctcaggcgcaatcacgaatgaataacggtttgstigatgcgagteattttgatgacgagcetaatgectgecctgtigaa 
caagtctggaaagaaatgcataaacttttgccattctcaccggattcagtcgtcactcatggteatttctcacttgataac 
cttatttttgacgageeoaaattaatagettgtattgatgttgsacgagtcggaatcgcagaccgataccaggatcttge 
catcctatggaactgcctcgetgagttttctccttcattacagaaacggctttttcaaaaatatgetattgataatcctgata 
tgaataaattgcagtttcatttgatgctcgatgagtttttctaatttttttaagecagttattgetgcccttaaacgcctgett 
gctacgcctgaataagtgataataagcgeatgaatggcagaaattcgaaagcaaattcgacccegtcetcgsttcage 
gcaggstcettaaatagccgcttatgtctattgctgett 

spec® (1,201 bp) 
cagccaggacagaaatgcctcgacttcgctgctgcccaagettgccgsstgacgcacaccetggaaacggatgaage 
cacgaacccagtggacataagcctgttcgeticgtaagctgtaatgcaagtagcetatgcgctcacgcaactggtcca 
gaaccttgaccgaacgcagcgetestaacgececagtgecesttttcatgscttgttatgactgtttttttggeetacag 
tctatgcctcggecatccaagcagcaagcecettaceccetgestcgatetttgatettatggagcagcaacgatgtta 
cgcagcaggecagtcgccctaaaacaaagttaaacatcatgaggeaagcesteatcgccgaagtatcgactcaacta 
tcagaggtagttgecetcatcgagcgccatctcgaaccgacettgctgeccetacatttgtacggctccgcagtggatg 
gcggcctgaagccacacagtgatattgatttgctgettacgstgaccetaagecttgatgaaacaacgcgecgagcttt 
gatcaacgaccttttggaaacttcggcttcccctggagagagcgagattctccgcgctgtagaagtcaccattgtigtg 
cacgacgacatcattccgtggcettatccagctaagcgcgaactgcaatttggagaatgecagcgcaatgacattcttg 
caggtatcttcgagccagccacgatcgacattgatctggctatcttgctgacaaaagcaagagaacatagcgttgectt 
ggtagetccagcggcegageaactctttgatccgettcctgaacaggatctatttgagecectaaatgaaaccttaacg 
ctatggaactcgccgcccgactggectgeceatgagcgaaatgtagtecttacettgtcccgcatttgetacagcgca 
gtaaccggcaaaatcgcgccgaaggatgtcgctgccgactggecaatgeagcecctgccgscccagtatcagcccg 
tcatacttgaagctagacagecttatcttggacaagaagaagatcgcttgecctcgcgcgcagatcagttggaagaat 
ttgtccactacgtgaaagecgagatcaccaaggtagtcggcaaataaagctttactgagctaataacaggactgctgg 
taatcgcaggcctttttatttctgca 

tolC (1,746 bp) 
ttgaggcacattaacgccctatggcacgtaacgccaaccttttgcgetagcgecttctgctagaatccgcaataattttac 
agtttgatcgcgctaaatactgcttcaccacaaggaatgcaaatgaagaaattgctccccattcttatcggcctgagcc 
tttctgggttcagttcgttgagccageccgagaacctgatgcaagtttatcagcaagcacgccttagtaacccggaatt 
gcgtaagtctgccgccgatcgtgatgctgcctttgaaaaaattaatgaagcgcgcagtccattactgccacagctagg 
tttaggtgcagattacacctatagcaacggctaccgcgacgcgaacegcatcaactctaacgcgaccagtgcetcctt 
gcagttaactcaatccatttttgatatgtcgaaatgecetgcettaacgctgcaggaaaaagcagcaggeattcaggac 
gtcacgtatcagaccgatcagcaaaccttgatcctcaacaccgcgaccgcttatttcaacgtettgaatgctattgacg 
ttctttcctatacacaggcacaaaaagaagcgatctaccgtcaattagatcaaaccacccaacgttttaacgtggecct 
ggtagcgatcaccgacgtgcagaacgcccgcgcacagtacgataccetgctgecgaacgaagtgaccgcacgtaat 
aaccttgataacgcgetagagcagctgcgccagatcaccggtaactactatccggaactggctgcectgaatgtcgaa 
aactttaaaaccgacaaaccacagccggsttaacgcectgctgaaagaagccgaaaaacgcaacctgtcgctgttaca 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


ggcacgcttgagccaggacctggcececgagcaaattcgccagececageatgetcacttaccgactctggatttaa 
cggcttctaccgggatttctgacacctcttatagcgettcgaaaacccgtgsteccectgetacccagtatgacgatag 
caatatgggccagaacaaagttggcctgagcttctcgctgccgatttatcagescgeaatgettaactcgcagstgaa 
acaggcacagtacaactttgtcggteccagcgagcaactggaaagtgcccatcgtagcetcgtgcagaccetecettc 
ctccttcaacaacattaatgcatctatcagtagcattaacgcctacaaacaagccgtagtttccgctcaaagctcattag 
acgcgatggaagceeectactcgstcgstacecetaccattgtigatgtettgsatgcgaccaccacgttgtacaacg 
ccaagcaagagctggcgaatgcecettataactacctgattaatcagctgaatattaagtcagctctggstacettgaa 
cgagcaggatctgctggcactgaacaatgcgctgagcaaaccegstttccactaatccggaaaacgttgcaccgcaaa 
cgccggaacagaatgctattgctgatgettatgcecctgatagcccggcaccagicgttcagcaaacatccgcacgca 
ctaccaccagtaacggtcataaccctttccgtaactgatgacgacgacgeeeaagcttaattagctgatctagageca 
tcaaataaaacgaaagectcagtcgaaagactggecctticgttttatctgttgtttgtcgstgaacgctctcctgagtag 
gacaaatccgccgccctaga 

zeo® (762 bp) 
Ggtgttgacaattaatcatcggcatagtatatcegcatagtataatacgacaaggtgageaactaaaccatggccaag 
ttgaccagtgccgttccggtgctcaccececgcgacetceccegagcestcgagttctggaccgaccggctcgeesttc 
teccgggacttcgtggageacgacttcgccggtetestccegeacgacetgaccctgttcatcagcgcgetccaggac 
caggtestgccggacaacaccctggcctggetetesstececescctggacgagctetaceccgagtestcgeage 
tegtgtccacgaacttccgggacecctccggseccegccatgaccgagatcgecgagcagccetgeesecegeactt 
cgccctgcgcgacccegccggcaactgcetgcacttcgtgeccgageagcageactgacacgtccgacgecesccc 
acggetcccagecctcggagatccetcccccttttcctttgtcgatatcatgtaattagttatgtcacgcttacattcacg 
ccctccccccacatccgctctaaccgaaaaggaageagttagacaacctgaagtctagetccctatttatttttttatag 
ttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatact 
gaaaaccttgcttgagaagettttgeeacectcgaagectttaatttgcaagct 
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Extended Data Figure 1 | Comprehensive map of synthetic auxotrophs. 
Circos plot summarizing synthetic auxotrophs generated in this study. Red and 
green genes reflect knockouts and insertions, respectively. Outermost ticks 
indicate genomic location, inner blue ticks indicate locations where TAG 
codons were converted to TAA in the GRO, and green ticks reflect the locations 


TAG loci in synthetic auxotrophs, where yellow ticks represent amino-terminal 
of 303 E. coli essential genes. The shaded grey inner circle contains essential 


insertions, blue ticks represent tolerant substitutions, and red ticks represent 


functional-site substitutions. Innermost links represent unique combinations 


of TAGs in higher-order synthetic auxotrophs. Links of a single colour 
correspond to a single strain. 
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Extended Data Figure 2 | rpsD.Q54R is sufficient for loss of pAcF- 
dependence in SecY.Y122a. a, Plate map with genotypes of strains shown in 
b and c. On the top half of the plate SecY.Y122.E1 (top right quadrant) 
contains the rpsD.Q54R mutation and is an escape mutant of pAcF-auxotroph, 
SecY.Y122« (top left quadrant). On the bottom half of the plate the 
rpsD.Q54R mutation was introduced into SecY.Y122« (bottom right quadrant), 
resulting in a loss of pAcF-dependence, and reverted to wild type in 
SecY.Y1220.E1 (bottom left quadrant), restoring pAcF-dependence. The 
amino acid present at residue 54 within RpsD is indicated at the perimeter of 


the plate, where red signifies that the given mutation was introduced into the 
genotype by MAGE to demonstrate the causal mechanism of escape. b, Growth 
on solid permissive media demonstrates growth of all strains. c, Growth on 
solid non-permissive media. Introduction of the rpsD.Q54R mutation into the 
synthetic auxotroph SecY.Y122e results in loss of containment (bottom right 
quadrant). Reverting the mutation to wild type in SecY.Y1220.E1 results in 
restoration of containment (bottom left quadrant). Together, these data 
demonstrate that the rpsD.Q54R mutation is sufficient for loss of pAcF- 
dependence in SecY.Y1220. 
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Extended Data Figure 3 | Quantitative assessment of amino acid tolerance 
in higher-order pIF auxotrophs. Representative assay surveying tolerance 
of one of three essential TAG loci to the twenty amino acids in different 
synthetic auxotrophs and expressed as log}, of total cell survival. The + symbol 
indicates the presence of a TAG codon at the specified locus in the background 
strain and — indicates the wild-type codon. Blue and yellow indicate high 
and low tolerance to substitution, respectively. Substitutions DnaX.Y113W and 
SecY.Y122Q are tolerated but yielded a lower percentage of survival on non- 
permissive media in a background with two TAGs, an effect that was 


pronounced in a background with three TAGs. While DnaX.Y113, SecY.Y122 
and LspA.Y54 are permissive for most natural amino acids, strains with more 
than one of these essential TAGs are less prone to survive in the event that 
any one TAG is compromised. SecY.Y122Q and DnaX.Y113W were 
tolerated substitutions also observed in real escape mutants of these strains 
(Supplementary Table 7). Reported results repeated at least three times in 
independent experiments. Refer to the Methods for a complete description of 
this experiment. 
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Extended Data Figure 4 | Deletion of tyrT and tyrV restores pIF- 
dependence and fitness of rEc.B.dC.12’.E7. a, Plate map with genotypes of 
strains shown in b and c. rEc.$.dC.12’.E7 is an escape mutant of its sAA- 
dependent ancestor (rEc.B.dC.12') and contains a tyrT ochre suppressor 
mutation (supC). The fitness of rEc.B.dC.12'.E7 in permissive media is 
impaired relative to rEc.B.dC.12’, with doubling times of 91.74 (41.49) and 
61.81 (£0.65) minutes, respectively. Tyrosine tRNA redundancy was 
eliminated (AtY) in both strains by A-Red mediated replacement of tyrT and 
tyrV with chloramphenicol acetyltransferase (cat), rendering the resulting 
strains (rEc.B.dC.12’.AtY and rEc.B.dC.12'.E7.AtY) dependent on tyrU for 
tyrosine incorporation during normal protein synthesis. Elimination of 
tyrosine redundancy reduced the escape frequency of rEc.B.dC.12’ from 


2.17X 10° (Fig. 2e) to <4.85 x 10 |? (no escape mutants were observed 
upon plating 2.06 X 10'' cells) and restored pIF-dependence in 
rEc.B.dC.12'.E7 to <4.73 X 10 |” (no escape mutants were observed upon 
plating 2.12 X 10" cells). Escape mutants were not detected for either strain up 
to 7 days after plating on non-permissive media (Fig. 3d and Supplementary 
Table 11). Tyrosine tRNA deletion also restored the fitness of the escape mutant 
to approximately that of its sAA-dependent ancestor (60.66 + 0.12 min). 
Taken together, these results establish tyrT as the causal mechanism of escape in 
rEc.B.dC.12'.E7. b, Growth on solid permissive LB media. c, Growth on solid 
non-permissive LB media. All reported doubling times are averages, where 

n = 3 technical replicates, and error bars represent +s.d. Refer to the Methods 
for a complete description of escape frequencies. 
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Extended Data Figure 5 | Growth profiles of strains expressing 
phenylalanine or tryptophan amber-suppressor tRNAs. a-d, Growth was 
assessed for rEc.y.dC.46’ .AtY and rEc.B.dC.12'.AtY in the presence of amber 
suppression by either pTech-supU (blue), pTech-supPhe (red), or in the 
absence of plasmid-based amber suppression (black). Cells were washed twice 
with dH,O and re-suspended in the same volume of 1X PBS. Washed cells were 
normalized by OD¢oo to inoculate roughly equal numbers of cells per well. 
Growth profiles are shown for rEc.y.dC.46’.AtY (a, b) and rEc.B.dC.12’.AtY 
(c, d) in permissive (+sAA/+L-arabinose, solid lines) and non-permissive 
(—sAA/~t-arabinose, dashed lines) LB liquid media. Doubling times are 
shown for the ancestral strain (black) in permissive media and suppressor- 
containing strains (red and blue) in non-permissive media where growth was 


-sAA/-ara 


supU (Trp) 


36 


observed. Plasmid containing strains were always grown in the presence of 
zeocin for plasmid maintenance. Growth was never observed for the contained 
ancestors in non-permissive media (black, dashed lines). In the presence 

of tryptophan suppression, growth of rEc.y.dC.46’ .AtY was not observed and 
growth of rEc.B.dC.12’.AtY was severely impaired (380 min doubling time), 
with a 6.24-fold increase in doubling time relative to the contained ancestor 
grown in permissive media. In the presence of phenylalanine suppression, 
growth of rEc.B.dC.12’.AtY was not observed and growth of rEc.y.dC.46’.AtY 
was severely impaired (252 min doubling time), with a 3.90-fold increase in 
doubling time relative to the contained ancestor grown in permissive media. 
Representative growth profiles and doubling times are reported. These results 
repeated at least three times in individual experiments. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


6.0 
5.5 
5.0 
45 
4.0 
3.5 
3.0 
25 
2.0 
15 
1.0 
0.5 


0.0 
0 2 4 6 8 10 12 14 16 18 


OD,,, Of liquid media 


OD, 


Time (days) 


10° c.f.u. on permissive solid media 


c.f.u. per 50 ul on solid media 


Time (days) 


Extended Data Figure 6 | Long-term growth of rEc.y.dC.46’ .AtY in liquid 
LB media relative to rEc.y. a-c, Approximately 10"! cells of strain 
rEc.y.dC.46’.AtY (indicated with triangles) were inoculated into 11 of 
permissive (+sAA/+L-arabinose, blue) or non-permissive (—sAA/—L- 
arabinose, red) LB media and incubated with agitation at 34 °C for 20 days. 
Results from the equivalent experiment with the non-contained ancestor rEc.y 
(indicated with diamonds) are also shown. Cultures were frequently monitored 
by ODgoo (a) and quantification of c.fu. on solid permissive (+sAA/+L- 
arabinose) (b) and non-permissive (—sAA/—L-arabinose) (c) LB media. C.f.u. 
are plotted as the average of three replicates. Open symbols indicate that no 
c.f.u. were observed. Symbols for rEc.y.dC.46’ .AtY are not visible because c.f.u. 
were never observed from either permissive or non-permissive liquid cultures 


Strain 
A [Ec.y.dC.46’.AtyY 


@ rEcy 
Liquid media 


+sAA/+ara 


-sAA/-ara 


40° c.f.u. on nonpermissive solid media 


c.f.u. per 50 ul on solid media 
3 


Time (days) 


plated on non-permissive solid media. At the end of the 20-day growth period, 
both cultures containing rEc.y.dC.46’ AtY were interrogated for the presence of 
a single escape mutant by plating each 1] of culture across 30 non-permissive 
solid media plates. C.f-u. were not observed and remained below the limit of 
detection for the following 7-day observation period. We hypothesize that the 
decrease in c.f.u. counts obtained on permissive solid media for the permissive 
culture of rEc.y.dC.46’ .AtY reflects pAzF degradation at =6 days. Reported 
c.fu. values are averages, where n = 3 technical replicates, and error bars 

are +s.d. Reported results repeated at least three times in independent 
experiments. Refer to the Methods for a complete description of this 
experiment. 
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Extended Data Figure 7 | Dose-dependent growth of rEc.y.dC.46’.AtY in concentrations in the presence of 0% (f), 0.002% (g), 0.02% (h) and 0.2% (i) 
pAZF and L-arabinose compared to the non-contained ancestor. Growthin _ L-arabinose. e, j, Growth profiles illustrated in a-d and f-i are depicted as heat 
LB media supplemented with different concentrations of pAzF and maps in e and j, respectively, where the maximum ODgo9 was obtained from 
L-arabinose. a-d, Growth profiles for rEc.y across a gradient of pAzF the average of three replicates and plotted in MATLAB. Reported growth 
concentrations in the presence of 0% (a), 0.002% (b), 0.02% (c) and 0.2% (d) _ profiles and heat map values are averages, where n = 3 technical replicates. 
L-arabinose. f-i, Growth profiles for rEc.y.dC.46’.AtY across a gradient ofpAzF Reported results repeated at least three times in independent experiments. 
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Extended Data Figure 8 | Dose-dependent growth of rEc.B.dC.12’.AtY in illustrated in parts a-d are depicted as a heat map, where the maximum ODgo9 


pIF and L-arabinose. Growth in LB media supplemented with different was obtained from the average of three replicates and plotted in MATLAB. 
concentrations of pIF and L-arabinose. a—-d, Growth profiles for Reported growth profiles and heat map values are averages, where n = 3 
rEc.8.dC.12’.AtY across a gradient of pIF concentrations in the presence of technical replicates. Reported results repeated at least three times in 

0% (a), 0.002% (b), 0.02% (c) and 0.2% (d) L-arabinose. e, Growth profiles independent experiments. 
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Extended Data Figure 9 | Proximity-dependent complementation of biotin 
auxotrophy. Wild-type E. coli K-12 substr. MG1655 and three strains 
auxotrophic for biotin, ECNR2, rEc.y (a non-contained GRO with an integrated 
pAzF OTS) and rEc.y.dC.46’ (also a synthetic auxotroph) were grown either 


adjacent or separately on rich-defined solid media. ECNR2 grew on biotin- 
deficient media when plated in close proximity to wild-type E. coli, suggesting 
cross-feeding of the essential metabolite. The pAzF auxotroph only grew on 
media supplemented with biotin, pAzF and L-arabinose. 
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Predicting climate-driven regime shifts versus 
rebound potential in coral reefs 


Nicholas A. J. Graham!, Simon J ennings””, M. Aaron MacNeil!*, David Mouillot’? & Shaun K. Wilson®” 


Climate-induced coral bleaching is among the greatest current threats 
to coral reefs, causing widespread loss of live coral cover’. Conditions 
under which reefs bounce back from bleaching events or shift from 
coral to algal dominance are unknown, making it difficult to predict 
and plan for differing reef responses under climate change”. Here we 
document and predict long-term reef responses to a major climate- 
induced coral bleaching event that caused unprecedented region- 
wide mortality of Indo-Pacific corals. Following loss of >90% live 
coral cover, 12 of 21 reefs recovered towards pre-disturbance live coral 
states, while nine reefs underwent regime shifts to fleshy macroalgae. 
Functional diversity of associated reef fish communities shifted sub- 
stantially following bleaching, returning towards pre-disturbance struc- 
ture on recovering reefs, while becoming progressively altered on 
regime shifting reefs. We identified threshold values for a range of 
factors that accurately predicted ecosystem response to the bleach- 
ing event. Recovery was favoured when reefs were structurally complex 
and in deeper water, when density of juvenile corals and herbivorous 
fishes was relatively high and when nutrient loads were low. Whether 
reefs were inside no-take marine reserves had no bearing on ecosys- 
tem trajectory. Although conditions governing regime shift or recov- 
ery dynamics were diverse, pre-disturbance quantification of simple 
factors such as structural complexity and water depth accurately pre- 
dicted ecosystem trajectories. These findings foreshadow the likely 
divergent but predictable outcomes for reef ecosystems in response 
to climate change, thus guiding improved management and adaptation. 

Some mass bleaching events have resulted in the loss of almost all 
live coral within individual nations’. Examples of coral recovery follow- 
ing severe bleaching have been documented*”, yet theory predicts that 
regime shifts to new benthic assemblages, such as macroalgae, are also 
likely’. To date, there are no documented coral reef regime shifts attrib- 
uted specifically to climate change. Evidence for coral reef regime shifts 
due to other causes comes almost exclusively from the Caribbean®”, with 
limited knowledge from the more extensive and diverse Indo-Pacific 
reef province®. Ongoing uncertainty about Indo-Pacific coral reef res- 
ponses to climate impacts has generated considerable debate regarding 
appropriate management and adaptation plans, especially those related 
to no-take marine reserves”"®, that can be resolved through understanding 
site and ecosystem characteristics that dictate reef ecosystem trajectories. 

Using a 17-year data set, spanning a major climate-induced bleaching 
event, we assess the long-term ecosystem dynamics of 21 reef sites across 
the inner islands of Seychelles. Seychelles reefs were the most severely 
affected globally by the 1998 coral bleaching event, in which a strong El 
Nifio coincided with the Indian Ocean dipole’. Across all sites before 
the bleaching event, average hard coral cover was 28% and macroalgal 
cover 1% (Fig. 1a), within average bounds for the Indian Ocean region 
at the time’*. The mass bleaching event was severe across all reefs in the 
inner Seychelles, with coral cover reduced by >90%". Both hard coral 
cover and macroalgal cover steadily increased between 2005 and 2011, 


with high heterogeneity reflecting markedly different trajectories among 
sites (Fig. la, Extended Data Figs 1 and 2). 

Using four complimentary metrics (Methods), we defined a recover- 
ing reef as having greater post-disturbance coral cover than macroalgae 
cover, with coral cover remaining high or increasing. A regime shifting 
reef was defined as having greater post-disturbance macroalgal cover 
than coral cover, with macroalgal cover remaining high or increasing. 
We used an index, based on the Euclidian distance from pre-bleaching 
benthic composition, to visualize the differing benthic trajectories of 
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Figure 1 | Recovery and regime shift dynamics on Seychelles coral reefs. 

a, Box (median and 50% quantile) and whisker (95% quantile) plots of coral 
cover and macroalgal cover in 1994 and through time after the 1998 bleaching 
event (n = 84). b, Conceptual diagram for visualizing recovery and regime shift 
dynamics, using post-disturbance cover of coral (blue) and macroalgae (red) 
and Euclidian distance of benthic composition from pre-disturbance values. 
c, Recovery and regime shifts on Seychelles reefs. Lines are multinomial 
model fits for percent coral and macroalgae on each of the 21 reef sites (years 
distinguished by shading), with Euclidian distance from pre-disturbance 
composition as a predictor (n = 63). Each reef site is represented 

by two points, one for coral cover (blue dots) and one for macroalgal cover 
(red dots). Twelve reefs (top of figure) are recovering, moving towards pre- 
disturbance (1994) compositions, with increasing coral cover and low 
macroalgal cover, whereas 9 reefs are regime shifting, with increasing distance 
from 1994 compositions, increasing macroalgal cover and low coral cover 
(Extended Data Table 1, Extended Data Fig. 4). 
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regime shifting and recovering reefs (Fig. 1b). Of the 21 reefs surveyed, 
12 followed a post-bleaching recovery trajectory, with live coral cover 
increasing through time to an average cover of 23% by 2011, macro- 
algal cover remaining low (<1%), and benthic composition moving 
towards that observed in 1994 (Fig. 1c, Extended Data Table 1, Extended 
Data Figs 3 and 4). Annual increases in coral cover on recovering reefs 
were low for 7-10 years following bleaching, but then increased sub- 
stantially, probably reflecting increasing local recruitment levels neces- 
sary for rapid recovery in isolated coral communities’. In contrast, 9 of 
the 21 reefs followed a post-bleaching regime shift trajectory, with aver- 
age macroalgal cover steadily increasing to 42% by 2011, coral cover 
remaining low (<3%), and benthic composition diverging substantially 
from the pre-bleaching state (Fig. 1c, Extended Data Table 1, Extended 
Data Figs 3 and 4). Percent cover of coral and macroalgae did not differ 
between these two groups of reefs in 1994 (Extended Data Table 1), pro- 
viding the first clear evidence that regime shifts to macroalgae-dominated 
states occur in response to major coral bleaching events on Indo-Pacific 
reefs. 

Divergent ecosystem trajectories following bleaching are likely to 
have major implications for reef associated organisms. The functional 
structure of communities, which captures species’ roles based on their 
biological traits, is predicted to show deterministic links to disturbance“, 
but predictions are rarely tested with data. Here we show changes to 
functional structure of reef fish assemblages are strongly tied to the post- 
bleaching benthic response (Fig. 2, Methods). Seven years after bleach- 
ing, assemblages on all reefs had fewer small bodied species and a less 
complex functional structure, typified by increasing dominance of inver- 
tebrate feeding fishes. On recovering reefs, a smaller initial change in 
functional structure was followed by a return towards the pre-bleaching 
state (Fig. 2b), whereas the much greater change on reefs that underwent 
regime shifts progressed in post-bleaching years, suggesting important 
functions are being eroded (Fig. 2c). 

We assessed the association between 11 reef-level factors and eco- 
system trajectories post-disturbance, a priori selected for their generic 
importance in structuring coral reef ecosystems globally (Extended Data 
Table 2). Five of the factors correctly (97% of the time) characterized 
post-bleaching reef trajectories as ‘recovery’ or ‘regime shift’: density of 
juvenile corals, initial structural complexity of the reef, water depth, bio- 
mass of herbivorous fishes (also reflecting herbivore diversity, which 
was collinear), and nutrient conditions of the reef (Fig. 3). Estimates of 
critical values for each factor, where recovery was more likely to occur 
than a regime shift, showed that juvenile coral densities >6.2 per m” reduce 
the probability of a regime shift (Fig. 3a), consistent with the expected 
role of coral recruitment and survival in reef recovery*. Coral larval supply 
does not differ among recovering and regime shifting reefs in Seychelles, 
but post-settlement survival is lower on regime shifted reefs’’, likely 
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due to unsuitable settlement substrate, or enhanced post-settlement 
mortality due to macroalgal overgrowth or alleopathic mechanisms”"*. 
Structural complexity, which captures the structure provided by corals 
and the underlying reef matrix, decreased the risk of a regime shift tra- 
jectory when values before disturbance were >3.1 (widespread mod- 
erately complex relief) (Fig. 3b). Structural complexity influences a range 
of ecological processes, being a substantial contributor to diversity and 
productivity of many reef associated organisms'”"*. Sites deeper than 
6.6 m depth were less likely to undergo a regime shift (Fig. 3c), possibly 
reflecting the relationship between light penetration and algal growth”, 
or greater vulnerability of shallower reefs to disturbances such as recur- 
rent coral bleaching or storm damage”. Experimental and modelling 
studies will be necessary to clarify the principal mechanisms by which 
structural complexity and depth influence ecosystem trajectories. 

Reef fish herbivory is a key process mediating competition between 
corals and algae”’, and herbivore biomass, implicitly linked to body size, 
is a good proxy for overall rates of herbivory”. Interestingly, a relatively 
low biomass of herbivores (177 kg ha ')—below average values for the 
Indian Ocean”—reduced the risk of a regime shift occurring (Fig. 3d). 
Algal proliferation and dominance over corals is also influenced by nutri- 
ent input to reef systems”. Here, lower C:N ratios in macroalgal fronds, 
indicative of higher nutrient loads”, were linked to a greater likelihood 
of regime shifts, with the likelihood of regime shifts reducing below 50% 
when ratios passed 38 (Fig. 3e). Previous debate has focused on whether 
levels of herbivory or nutrients mediate coral reef regime shifts or recov- 
ery’*””. Our results suggest that although both variables relate to ecosys- 
tem trajectory, they are weaker and less certain predictors than structural 
complexity, depth, and the density of juvenile corals. 

Reefs within no-take marine reserves were no more likely to recover 
than reefs in fished areas. Although marine reserves may have a positive 
long-term influence on coral cover in the absence of disturbance”*’, our 
results suggest they may have little influence on post-disturbance ben- 
thic trajectories. In Seychelles, marine reserves do enhance herbivore 
biomass (2005 values: 279 kgha ‘+ 21.5s.e.) compared to fished areas 
(mean 163 kg ha | + 58.6s.e.), but biomass in fished areas was still close 
to the threshold we identified for recovery. Marine reserves may have a 
greater role in aiding coral recovery in nations where herbivorous fishes 
are more heavily exploited or where fishing gears that reduce structural 
complexity are used. 

Collecting data on nutrients and juvenile coral densities can be chal- 
lenging for the often resource-limited agencies charged with monitor- 
ing and managing coral reef ecosystems, especially over large areas. We 
therefore assessed the predictive power of structural complexity and 
water depth alone to predict reef trajectories. Both variables can be rapidly 
recorded over large areas and are relatively stable through time, allowing 
extensive pre-disturbance data to be compiled. Using only these two 
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Figure 2 | Trajectories in the functional structure of fish assemblages. 

a, Functional trait space underlying the analyses, with arrows depicting 
increasing body size amongst each fish functional group, depicted by fish 
outlines: corallivores (1), grazing herbivores (2), invertebrate feeders (3), 
planktivores (4), piscivores (5), scraping and excavating herbivores (6). 

b, c, Position of recovered (b) and regime shifted (c) reef sites in the functional 
space. Black dots represent the mean trait values in 1994, pale to darker colours 
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represent 2005-2011 (n = 84). Large rings indicate centroid for sites within 
each year (2005 and 2008 overlap in b). Dashes on the axes indicate centroid 
positions for each year. Year and ecosystem trajectory were significant factors 
(Methods, ANOVA model: axis 1 year P< 0.001, trajectory P< 0.01; axis 2 
year P< 0.001, trajectory P< 0.05). Fish graphics by T. Saxby, D. Kleine and 
J. Woerner (Integration and Application Network, University of Maryland 
Center for Environmental Science, http://ian.umces.edu/imagelibrary/). 
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Figure 3 | Bayesian hierarchical logistic regression of predictor variables for 

the probability of a regime shift. a—e, Marginal plots for predictor variables 

included in the full model: solid black line represents the mean model fit; 

grey lines are random resamples of the parameter estimates, representing 

variability of the expected model fit. Grey dots represent replicates from regime 


variables, we correctly predicted reef trajectory 98% of the time. This 
suggests that, when combined, structural complexity and depth can 
effectively identify reefs with a high likelihood of recovery or regime 
shift, thereby informing reef risk assessment and guiding marine spa- 
tial planning initiatives (Fig. 4). 

We have demonstrated persistent and divergent responses of coral 
reefs to climate-induced bleaching with potential consequences for reef 
fish functioning. With predictions that mass coral bleaching events will 
increase in frequency’, our findings foreshadow the likely divergent tra- 
jectories expected on other Indo-Pacific reefs. We show that several factors 
can affect reef ecosystem trajectories following bleaching but, where 
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Figure 4 | Contour biplot of the probability of regime shift (red shading) or 
recovery (blue shading) based on initial structural complexity and water 
depth. For example, for a complexity value of 3, there is a 0.9 probability 

of a regime shift occurring for sites <4.5 m depth (red dot), and only sites 
deeper than ~8 m are more likely to recover. For a complexity value of 3.2, 
sites deeper than 6.3 m are highly likely to recover (blue dot). 
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shifted (1) or recovering (0) reefs (n = 184). The blue dot represents the point at 
which regime shifts and recovery are equally likely. Blue shading is beyond 
upper 95% uncertainty interval for P (shift) = 0.5, where regime shifts are 
decreasingly likely. f. Effect size medians (black dot), with 50% (thick line) and 
95% (thin line) highest posterior density (HPD) uncertainty intervals. 


necessary, just depth and structural complexity may be useful predic- 
tors of ecosystem fate. The predictor factors we identify are important 
for coral reefs globally (Methods) and the range of values reported for 
Seychelles is consistent with other locations in the Indo-Pacific (Extended 
Data Table 3). Further, depth and initial structural complexity evaluated 
across 6 other countries from East Africa to the South Pacific were con- 
sistent predictors of coral or macroalgal dominated reefs post-disturbance, 
with effect sizes overlapping those from Seychelles (Extended Data Fig. 5). 
However, there may be regional variations in the relative contribution 
of factors and threshold values we identified that require more invest- 
igation. Uncovering the predictors that dictate reef trajectories following 
major bleaching events can inform ecosystem management and strat- 
egies for human adaptation. Mapping probable ecosystem trajectories 
can enable limited management resources to target actions, such as phas- 
ing out fishing gears that cause habitat damage”, on reefs where increased 
structural complexity could help promote recovery”. Spatial under- 
standing of ecosystem vulnerability to climatic impacts also holds great 
potential to be linked with social vulnerability assessments to develop 
the human adaptation strategies needed to cope with anticipated regime 
shifts and associated changes in ecosystem services”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Field Surveys and indices. Twenty one reefs throughout the inner Seychelles Islands 
were surveyed using identical methods in 1994, before the mass coral bleaching of 
1998, and in 2005, 2008 and 2011, following the bleaching event. The surveys 
incorporated three different reef habitat types; carbonate fringing reefs; granitic 
rocky reefs with coral growth; and patch reef habitats on sand, rubble, or rock base. 
Eight to sixteen replicate 7 m radius point counts were surveyed along the reef slope 
on each reef, covering up to 0.5 km of reef front and 2,500 m/” of reef habitat. The 
underlying substrate type and water depth were recorded at each site. Within each 
point count area, the percent cover of different growth forms of live hard coral, soft 
coral, macroalgae, sand, rubble, and rock was quantified, and the structural complex- 
ity of the reef was visually estimated on a 6 point scale’. This structural complexity 
measure captures landscape complexity, including the complexity provided by live 
corals, that of the underlying reef matrix and other geological features, and has 
been shown to correlate well to other measures of complexity, such as measures of 
reef height and the linear versus contour chain method’’. The complexity of each 
individual count area was assigned to one of the following categories: 0 = no vertical 
relief, flat or rubbly areas; 1 = low (<30 cm high) and sparse relief; 2 = low but 
widespread relief; 3 = widespread moderately complex (30-60 cm high) relief; 
4 = widespread very complex (60-100 cm high) relief with numerous fissures and 
caves; 5 = exceptionally complex (>1 m high) relief with numerous caves and over- 
hangs. Estimates can be standardised among observers with minimal training, thereby 
providing reliable and comparable data of ecological relevance*’. The density and 
individual sizes of 134 species of diurnally active, non-cryptic, reef associated fish 
were recorded within each point count area. Length estimation was calibrated at 
the beginning of each day by estimating the length of known lengths of PVC pipe. 
Accuracy was within 4% of actual lengths*’. We converted data on fish counts to bio- 
mass with published length—-weight relationships**™*. Fish were assigned to feeding 
groups based on their dominant diets and feeding behaviour’. The density of juvenile 
corals (<10 cm diameter) was quantified in 2011, within 8 replicate 33 cm by 33 cm 
randomly placed quadrats in each point count area. Juvenile coral data were also 
collected in 2008, and correlated strongly across sites with 2011 data (r = 0.85), 
suggesting that patterns in juvenile coral densities are fairly consistent through the 
post-disturbance years surveyed. Sea urchin density (family: Diadematidae) was 
counted in 2008 in each point count area. Meta-data from these surveys will be 
deposited in the James Cook University research data repository, the Tropical Data 
Hub (https://eresearch.jcu.edu.au/tdh). 

To assess local nutrient levels ten fronds from each of ten different Sargassum 
thalli were taken from each reef site in 2014. These samples were dried for 48 h at 
60 °C, powdered and analysed for nitrogen and carbon abundance. Carbon and 
nitrogen weight percent (%C and %N, respectively), and C:N ratios were deter- 
mined using a Costech Elemental Analyzer fitted with a zero-blank auto-sampler 
at James Cook University’s Advanced Analytical Centre, Cairns. The ratio of carbon 
to nitrogen in macroalgae is a stable, longer-term indicator of ambient nutrient 
regimes at a given location and should reflect stable spatial patterns among sites”>°*. 
Specifically, nutrient inputs incorporated into algal tissue are averaged over the 
active growing period (3-6 months before our collection in this region”), reflecting 
nitrogen availability during this time’*. This time period spans the main rainy 
season in Seychelles (December to February), so the nitrogen in these algal tissues 
should reflect land-based nutrient inputs. Although temporal variability in nutri- 
ent input is likely, due to differences in rainfall among years, the spatial patterns 
associated with adjacent land-based input among sites (that is, the relative differ- 
ences in nutrient regimes we are interested in here) are thought to be stable through 
time*®””, 

Wave exposure was calculated based on the uninterrupted (by land or other 
reefs) distance winds can blow over the ocean to generate waves (fetch), coupled 
with data on wind speed and direction**'. Larger waves develop with greater fetch 
and stronger winds. We added reef crest polygons to a base map of land mass”, 
and rasterised the map at a spatial resolution of 55 m. Fetch values for each of our 
21 reefs was calculated in 32 compass directions (each with an angular width of 11.25) 
using the Waves Toolbox for ArcGIS 10.2 (ref. 43). We restricted fetch calculations 
to a maximum distance of 500km around the Seychelles islands, reflecting wind 
generated wave energy for this isolated archipelago. We used hourly readings of 
both wind speed and direction from the Seychelles National Meteorological Service 
for every day of the post-bleaching period (1998 to 2011). Wave energy (in Joules) 
was calculated as a function of fetch, wind speed and direction**“’. We calculated 
average wave exposure for each reef based on hourly wave exposure estimates for 
the entire post-bleaching period, capturing information on strong and sporadic winds. 
Benthic trajectory. Changes in benthic composition among sites and through time 
were examined using correlation-based principle components analysis, based on 
Euclidian distance. Data were log(x + 1) transformed to account for some right skew- 
ness detected in draftsman’s plots and normalized to ensure all metrics in the analysis 
were on a common scale. Eigenvectors were overlaid to identify direction and 


contribution of the different variables to the patterns. We defined a regime shifting 
reef as one where post-disturbance macroalgal cover becomes greater than coral 
cover and trajectories through time indicate that cover of macroalgae remains high 
or is increasing. Recovering reefs, conversely, are defined as those reefs where post- 
disturbance coral cover becomes greater than macroalgal cover and trajectories 
through time indicate that cover of hard coral remains high or is increasing. To 
determine whether sites were regime shifting or recovering we used four metrics, 
the first based on a static cover estimate, while the other three reflect cover traject- 
ories. These four metrics provide a comprehensive assessment of site status and 
trajectory. Sites must conform to metric 1 and at least one of the trajectory metrics 
(2-4) to be classified as either recovering or regime shifting. 

(Metric 1) Percent cover of coral and macroalgae at last data point (2011). If coral 
cover is higher than macroalgal cover, and greater than the first post-disturbance 
survey (2005), the site is classified as recovering. If macroalgal cover is higher than 
coral cover and greater than pre-disturbance macroalgal cover (1994), the site is 
classified as regime shifting (Extended Data Table 1). 

(Metric 2) The Euclidian distance between the pre-disturbance (1994) benthic 
composition, and that of 2005, 2008 and 2011 was calculated at a site level, to quantify 
if this distance was increasing or decreasing through time. A substantial change from 
1994 and increasing distance through post-disturbance years indicates a regime 
shifting site, whereas declining distance through post-disturbance years is indi- 
cative of recovery (Fig. 1b, c, Extended Data Fig. 3). 

(Metric 3) The rate of change in coral or macroalgal cover post-disturbance 
(2005-2011) indicates the trajectory of the site in terms of increasing coral versus 
macroalgae. If rate of coral cover increase remains stable or increases faster than 
rate of macroalgal cover the site is classified as recovering, and vice versa for regime 
shifting (Extended Data Table 1). 

(Metric 4) Change in cover (coral and macroalgae) between 1994 and 2005, and 
1994 and 2011 was calculated. If the net decline in coral cover becomes smaller 
between these two time periods, and change in macroalgae is negligible and static, 
the site is classified as recovering. If the net decline in coral cover remains large and 
static, and increases in macroalgal cover are becoming greater, the site is classified 
as regime shifting (Extended Data Fig. 4). 

To visualize the regime shift and recovery dynamics, we quantified the site level 
relationships between distance from the pre-disturbance benthic composition and 
proportions of coral and macroalgae using a multinomial model of benthic com- 
position (j), whereby percentages of coral, macroalgae, and other substrates (100 X 
71) at a given site (i) were predicted based on Euclidian distances (ED) for a given 
year: 

Boj + ByED 

Parameters include an intercept (Bo) and slope (81) for ED in each of the habitat 

categories (j).This is a standard multinomial logit model, in this case run using the 
nnet package in R (http://www.R-project.org). To illustrate the progression of reef 
states, we ran the same model (1) for each post-disturbance survey (2005, 2008, 
and 2001) and represented uncertainty in model fit by sampling values from the 
maximum likelihood and estimated standard deviation of each parameter in (1) and 
plotting the resulting model fits (Fig. 1c and Extended Data Fig. 3). To better convey 
the progression of recovering and regime shifting reefs through time we rotated the 
ED predictor onto the y axis. 
Fish functional structure. The functional structure of fish assemblages was cal- 
culated based on two dominant traits that capture a large proportion of the implicit 
functional roles played by reef fishes: dietary group and body size. We classified the 
fish into well-established feeding groups (corallivores, invertivores, planktivores, 
grazing herbivores, scraping and excavating herbivores, and piscivores) based on 
the literature’ and Fishbase™’. These groupings cover some of the main feeding func- 
tions performed by fishes on coral reefs, including mediation of coral:algae inter- 
actions, removal of sediments, and trophic control through predation. Body size is 
a trait that captures information related to the feeding ecology, energetic demands, 
and movement patterns in reef fish. For example, there are well established non- 
linear relationships between body size and area of reef grazed by parrotfishes”, while 
predators feed on prey in accordance to their gape size’. Furthermore, home range 
size and functional range size in reef fish is tightly correlated to body size, with 
larger fish feeding over a greater area***°. Body size was coded into four categories: 
<20cm, 20-40 cm, 41-60 cm and >60.cm. We focused on these two biological 
traits to specifically target functional traits**. 

Based on these two functional traits, we performed a principal coordinate ana- 
lysis (PCoA) on a Gower distance matrix between species pairs to provide two inde- 
pendent synthetic axes that summarize species distribution within a trait functional 
space*’**, These two independent functional axes from PCoA, in combination with 
the species abundance matrix for all reefs in each year of sampling, were used to 
measure functional structure through biomass-weighted mean values for each 
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community. We used these biomass-weighted mean values for each PCoA axis to 
assess how the functional structure ofa given assemblage changed through time for 
both recovering and regime shifting reefs. Year and trajectory (regime shift versus 
recovery) had a significant influence on the functional structure of fish assemblages 
on both PCoA axes (ANOVA model: axis 1 year P < 0.001, trajectory P < 0.01; axis 
2 year P< 0.001, trajectory P < 0.05), whereas management (marine reserve ver- 
sus fished) had no significant influence (ANOVA model: management axis 1 
P= 0.533, axis 2 P= 0.106). 

Predicting recovery versus regime shift dynamics. We assessed the ability of 
eleven different site level factors (habitat type, initial branching coral cover, juvenile 
coral density, depth, herbivore biomass, herbivore diversity, marine reserve status, 
nutrient regime, initial structural complexity, sea urchin density, wave exposure) to 
predict whether a site went on a recovery or regime shift trajectory. Each factor was 
a priori chosen based on a pre-existing rationale from the literature, to specifically 
target factors known to represent key processes or to structure coral reef ecosystems 
globally (Extended Data Table 2). Indeed, the generic importance of the selected 
variables on coral reefs is highlighted by specific review papers, for example: branch- 
ing corals”; juvenile coral survivorship’; depth”’; marine reserve status”; herbivor- 
ous fish biomass and diversity~’; nutrient regime”’; structural complexity”; and sea 
urchins’. Furthermore, the range of values found on Seychelles are consistent with 
values from other reef locations in the Indo-Pacific (Extended Data Table 3), indi- 
cating that the relationships we find should be generic, although we might expect 
location-specific variation in thresholds. We could not include some factors, such 
as turbidity and water currents (typically weak in Seychelles), however we were able 
to accurately predict ecosystem trajectory with the covariates included in the model. 

Although some of the factors are not influenced over the time frames investi- 
gated here (for example, depth), or were constructed across time (for example, wave 
exposure), we had to make decisions on which time period to use for other factors. 
Branching coral cover, important for a range of reef fishes and other organisms”, is 
particularly vulnerable to coral bleaching*, so reefs with a high branching coral 
cover pre-disturbance may be expected to undergo extensive degradation, and thus 
we used data from 1994. We used pre-disturbance data for structural complexity 
because it can be maintained for many years post disturbance, ensuring the con- 
tinuation of reef processes’*. For both herbivorous fish biomass and diversity, we 
used data from 2005 to first account for any changes in these variables associated 
with the disturbance event itself*’, and second to reflect the amount of post-disturbance 
herbivory available to influence subsequent benthic responses. We were interested 
in post-settlement survival based on the juvenile coral density variable, so used data 
from 2011 to allow more (often slow growing) corals to make it through early life 
history stages. Sea urchin density (family: Diadematidae), would ideally be taken 
from 2005 to reflect herbivory available for post-disturbance benthic responses, 
however the first year of available data was 2008, which correlated well (coefficient 
0.57) to subsequent years, suggesting spatial consistency through time. As noted 
above, spatial differences in our nutrient regime data should reflect long-term pat- 
terns. Collinearity among these predictor covariates was assessed (Extended Data 
Fig. 6), and herbivore diversity was excluded from the analysis due to collinearity 
with herbivore biomass. 

The spatial distribution of sites following a regime shift or recovery trajectory 
was to some extent geographically clustered, but some sites in close proximity fol- 
lowed different trajectories (Extended Data Fig. 7). We used a Bayesian hierarchical 
logistic regression model to assess which site-scale variables were best at predicting 
the trajectory of the sites post bleaching (0 = recovered, 1 = regime shifted). Discarding 
variables with posterior densities centred on zero (suggesting no effect), our full 
model for regime shifts among Seychelles sites was: 


dij ~ Bern(pj) (2) 
logit (pi) ~N (1,0) (3) 
Hj = = By +B, LHB; + B,JCD;+ B,DEP; + BalST; + B;CNRy (4) 
Bo....3  ~N(0.0,1000) 6) 
¢  ~U(0,100) 6) 


which included log(herbivore biomass + 1) (LHB; kg ha_'); juvenile coral density 
(JCD; m?); depth (DEP; m); initial structural complexity (IST; 0-5 scale); and the 
carbon nitrogen ratio of sampled algae (CNR). Covariates were then standardized 
by subtracting their mean and dividing by 2 of their standard deviations to gauge 
their relative importance as the relative magnitude of their effect sizes. Models were 
run using the PyMC package™ for the Python programming language (http://www. 
python.org). 

For each of these variables, we calculated marginal 95% uncertainty intervals (UI) 
around the point at which the probability of a regime shift was half (P(shift) = 0.5). 
The lower 95% UI was taken as the point at which a regime shift is more likely than 
not, with all other variables held at their average, while the upper 95% UI was taken 
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as the point at which recovery is more likely to occur. The values within these 
uncertainty bounds reflect the unstable area where recovery or regime shift out- 
comes are equally likely. 

In order to assess the predictive ability of the full model, we ran a leave-one-out 
cross validation (Loo-CV) analysis, a permutation procedure whereby the ability 
of the model to successfully predict the ecosystem trajectory (recovery or regime 
shift) of any given site is repeatedly tested with different sites. To do this we randomly 
selected one site and predicted its recovery or regime shift status using a model based 
on the other 20 sites, repeating the procedure 999 times. 

We tested the predictive power of the model if reduced to include only two metrics 
(initial complexity of the reef and water depth), both of which can be easily quan- 
tified in surveys and collected on reefs before major disturbance events, making 
them very strong candidates for management planning and informing develop- 
ment of adaptation strategies. This gave a reduced parameter model: 


Vii ~ Bern (pi) (7) 
logit (pi) ~N (1,0) (8) 
Hy = By + B3DEP; + ByIST; (9) 


We re-ran the Loo-CV analysis with this subset of predictor variables to assess the 
predictive power that was lost by removing other variables from the analysis. Despite 
the substantial increase in deviance information criterion (DIC) scores between 
models (Mfull DIC = 129; Msubset DIC = 145), the subset model performed equally 
well in correctly predicting the trajectory of our observed reefs. We sampled from 
the posteriors of the two variables from the subset model across their observed 
range to model probability surfaces of regime shift risk when these two metrics are 
both quantified, presented as a contour plot in Fig. 4. 

To assess whether these two simple predictor variables (depth and initial struc- 
tural complexity) are likely to be useful predictors of post-disturbance reef con- 
dition elsewhere in the Indo-Pacific, we compiled data from 6 countries (Australia 
(Great Barrier Reef), British Indian Ocean Territory (Chagos), Fiji, Kenya, Maldives, 
Tanzania)**** where before disturbance site depth and structural complexity, and 
post-disturbance (average 8.4 years + 0.4 post-disturbance) coral and macroalgal 
cover were recorded. The disturbances were mostly climate induced bleaching, but 
in combination with crown-of-thorns starfish (COTS) in Fijiand COTS and storms 
in Australia. For depth we used data from all 51 reef sites. However, because struc- 
tural complexity can vary based on reef zone (for example, the reef slope versus the 
reef flat), or geomorphology (for example, atolls versus coastal fringing reefs), we 
used a paired design for assessing the role of structural complexity where a site that 
had become dominated by macroalgae was paired with a nearby similar site (for 
example, same zone, geomorphology, and depth) that had recovered to high coral 
cover. This resulted in 7 pairs (n = 14), from 4 of the countries. We ran the Bayesian 
hierarchical logistic regression model used for the Seychelles data, with reef sites 
assigned as having either higher macroalgal cover (1) or coral cover (0) post dis- 
turbance. The Bayesian effect size posterior density distributions for these wider 
Indo-Pacific data were compared to those for the Seychelles reefs (Extended Data 
Fig. 5). Considerable overlap in both the depth (Seychelles: —4.77 [—6.92, —2.62]; 
Indo-Pacific: —1.67 [—3.74, 0.15]; median and 95% highest posterior density 
uncertainty intervals) and complexity (Seychelles: —5.66 [—9.13, —2.69]; Indo- 
Pacific: —3.29 [—6.94, —0.25]) posterior distributions from the six Indo-Pacific 
countries with those from Seychelles, and more than 95% of the density in the 
posterior distributions falling below zero in all cases, suggests that both variables 
are important, generic predictors of macroalgal dominance post-disturbance on 
Indo-Pacific reefs. Interestingly, although the structure of reefs may vary based on 
zone or geomorphology, in 6 of the 7 pairwise comparisons, the reef with the 
greatest structure recovered. On average, reefs that recovered had complexity scores 
0.85 (0.09-1.62 UI) greater than reefs than became dominated by macroalgae. 
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Extended Data Figure 1 | Changing condition of Seychelles coral reefs. (c), while others collapsed (d) by 2005. e, f, In 2011, many reefs had recovered to 
a, b, Coral reefs of the inner Seychelles were typified by high coral coverandlow _ high live coral cover (e), while others had undergone a regime shift to abundant 
macroalgal cover in 1994. c, d, The 1998 coral bleaching event caused macroalgal cover (f). 

widespread coral loss, but some reefs maintained their structural complexity 
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Extended Data Figure 2 | Principal components analysis of benthic 2005-2011 following the 1998 bleaching event, whereas reefs coloured red are 
composition on 21 reefs across the inner Seychelles 1994-2011. Reefs shifting to alternate benthic compositions, dominated by macroalgae (n = 84). 


coloured blue are tracking back to pre-disturbance benthic composition in 
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Extended Data Figure 3 | Distance from pre-disturbance benthic 
community composition. Euclidian distance in multivariate space, plotted 
against percent cover of dominant biotic benthic organisms (live coral in blue, 
macroalgae in red) (n = 63). a, 2005 data. b, 2008 data. c, 2011 data. Shading 
represents 95% confidence bounds for the mean trend lines of each habitat type. 
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Extended Data Figure 4 | Changing coral and macroalgal cover is relation 
to pre-disturbance values. a, Data for recovering reefs, where the change in 
coral cover compared to 1994 was reducing through time, whereas change 

in macroalgae remained stable (n = 42). b, Data for regime shifting reefs where 
the decline in coral cover persisted through time, and changes in macroalgae 
increased through time (n = 42). 
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distributions for depth and initial structural complexity in predicting coral countries (n = 14). Depth and structural complexity variables were 

versus macroalgae outcomes post disturbance in Seychelles versus 6 other _ standardized in both analyses before estimation and all posterior distributions 
countries across the Indo-Pacific. a, Depth effect size plot, dark blue posterior have more than 95% of their density below zero. 

distribution for Seychelles, grey for other countries (n = 51). b, Initial structural 
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Extended Data Figure 6 | Collinearity matrix of the eleven predictor variables. 
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Extended Data Figure 7 | Map of study sites around the inner Seychelles. Sites in blue are recovering from the 1998 mass bleaching event, whereas sites in red 
have undergone a regime shift to macroalgal cover. 
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Extended Data Table 1 | Coral cover and macroalgal cover at recovering and regime shifting sites in 2011, with the change (based on the 
model slope) in coral cover and macroalgae estimates from sites between 2005-2011 


Macroalgae Macroalgae Macroalgae 
Coral cover Coral cover Coral slope cover (%) cover (%) slope 2005- 
(%) 1994 (%) 2011 2005-2011 1994 2011 2011 
Recovering sites 
Mahe E Patch 19.3 12.3 3.3 0.6 0.1 -0.8 
Mahe NW Carbonate 38.9 26.8 8.2 0.0 0.0 0.0 
Mahe NW Granitic 10.8 19.0 4.2 0.0 0.0 0.0 
Mahe NW Patch 17.3 21.3 3.3 0.0 0.0 0.0 
Mahe W Carbonate 34.9 23.9 4.2 0.0 0.0 0.0 
Mahe W Granitic 19.3 32.4 8.3 0.0 0.0 0.0 
Mahe W Patch 20.8 47.5 9.3 0.0 0.0 0.0 
Praslin NE Granitic 17.1 23.8 10.0 0.0 0.1 0.1 
Praslin NE Patch 25.4 10.1 29 0.0 0.0 0.0 
Praslin SW Granitic 17.4 17.5 6.8 0.3 0.0 0.0 
Ste. Anne Granitic 38.5 20.4 3.3 1.0 1.5 0.7 
Ste. Anne Patch 54.2 18.9 5.7 0.0 8.8 4.4 
Average 26.1 + 5.5 Cl 22.8 + 5.5 Cl 5.8 + 1.6 Cl 0.2 + 0.2 Cl 0.9 +1.4 Cl 0.4 + 0.8 Cl 
Regime shift sites 
Cousin Carbonate 49.7 0.4 -0.1 0.4 49.3 4.0 
Cousin Granitic 23.3 2.8 0.8 0.6 37.3 16.8 
Cousin Patch 38.6 2.8 11 0.0 26.0 -4.5 
Mahe E Carbonate 28.6 1.9 0.3 12.4 52.3 2.6 
Mahe E Granitic 9.3 1.0 2.5 0.7 46.0 22.2 
Praslin NE Carbonate 28.3 1.0 0.3 0.0 53.1 21.0 
Praslin SW Carbonate 42.1 0.4 -0.1 3.4 776 19.6 
Praslin SW Patch 17.3 3.4 13 4.7 14.5 1.3 
Ste. Anne Carbonate 40.1 14.8 2.6 2.0 20.3 3.8 
Average 30.8 + 8.5 Cl 3.14 + 2.9 Cl 0.40 + 0.9 Cl 2.7 + 2.6 Cl 41.84128Cl 9.63+6.6Cl 


Average values for coral cover and macroalgal cover in 1994 also given for recovering versus regime shifting reefs. 
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Extended Data Table 2 | Rationale for predictor variables included in models determining different post-disturbance reef trajectories on 
Seychelles reefs 


Predictor Rationale References 


Habitat type 


Pre-disturbance branching coral 


cover 


Juvenile coral density 


Depth 


Herbivorous fish biomass 


Herbivorous fish diversity 


Marine reserve status 


Nutrient regime 


Pre-disturbance structural 
complexity 


Sea urchin density 


Wave exposure 


Three habitat types were surveyed in Seychelles, including carbonate fringing 
reefs, granitic rocky reefs, and patch reef habitats. These habitats differ in 
their underlying matrix that may influence coral recruit success or likelihood of 
ecosystem collapse. 


Branching corals are particularly vulnerable to coral bleaching events, and 
once dead the structure they provide erodes fairly rapidly. Therefore, reefs 
with a high cover of branching coral before a disturbance may be particularly 
vulnerable to extensive coral loss and a reduction in other organisms. 


Successful settlement, survival and growth of new corals is thought to be key 
to coral recovery dynamics. 


Many threats on coral reefs are worst in shallow water, making shallow areas 
most vulnerable to change. Light penetration in shallow water may also favour 
rapid growth of fleshy macroalgae. 


Herbivorous fish are key to mediating competition for space among corals 
and algae. Biomass of these fish is a good proxy for function as the area of a 
reef grazed by these fish scales with both abundance and body size. 


Many types of algae have defences against herbivores, meaning that only 
certain species of fish can feed on some species of algae. This differential 
ability of fish species to control algae, can mean that a diversity of 
herbivorous species is required to provide the feeding complementarity 
necessary to control macroalgae. 


No-take marine reserves are expected to reduce fishing and hence enhance 
ecosystem processes, and may therefore promote faster rates of coral 
recovery. 


Higher nutrient loads in the waters around reefs can enhance the growth of 
algae and result in algae outcompeting corals, particularly when space 
becomes available through coral mortality. 


The structural complexity of a reef provides a great deal of the habitat 
variability for a diverse array of other organisms to inhabit. This in turn 
contributes to enhance a range of ecological processes, and provides niche 
space for coral settlement and survival. Structural complexity prior to a 
disturbance is expected to maintain ecosystem processes through the 
disturbance. 


Sea urchins are important herbivores on coral reefs, helping to control algal 
growth and promote successful coral recruitment and recovery. 


Wave exposure influences coral distribution patterns, growth forms and 
colony sizes that will likely affect recovery trajectories. Similarly, algal growth 
can be enhanced with higher flow rates due to increased exposure to water 
borne nutrients, but algal dislodgement can occur where wave exposure is 
strong. 


32,59,60 
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Including references 59-72. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 3 | Mean values with 95% confidence intervals for predictor covariates in Seychelles compared to other coral reef 


locations where similar data for the covariates were available 


Other 
Seychelles 95% Cl N locations 95%Cl N Countries 

Kenya, Madagascar, 
Herbivore biomass 210.3 59.0 21 258.1 33.8 281 Maldives, Mauritius, 

Mayotte, Reunion, Tanzania 
Pre-disturbance coral cover 28.1 5.4 21 29.0 3.9 60 Kenya, Tanzania 

British Indian Ocean 
Pre-disturbance structural complexity 3.2 0.1 21 3.1 0.2 30 Territory (Chagos), Great 

Barrier Reef (Australia) 
Urchin abundance 22 0.8 21 441 2.0 33 Kenya, Madagascar, 


Mozambique, Tanzania 


Data for regions other than Seychelles were provided directly by the authors of these studies?*73.74. 
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Transferred interbacterial antagonism genes 
augment eukaryotic innate immune function 


Seemay Chou'*, Matthew D. Daugherty***, S. Brook Peterson’, Jacob Biboy*, Youyun Yang”, Brandon L. Jutras®’, 
Lillian K. Fritz-Laylin®, Michael A. Ferrin’, Brittany N. Harding’, Christine Jacobs-Wagner®”"'°, X. Frank Yang”, 


Waldemar Vollmer*, Harmit S. Malik? & Joseph D. Mougous! 


Horizontal gene transfer allows organisms to rapidly acquire adap- 
tive traits’. Although documented instances of horizontal gene transfer 
from bacteria to eukaryotes remain rare, bacteria represent a rich 
source of new functions potentially available for co-option’. One ben- 
efit that genes of bacterial origin could provide to eukaryotes is the 
capacity to produce antibacterials, which have evolved in prokary- 
otes as the result of eons of interbacterial competition. The type VI 
secretion amidase effector (Tae) proteins are potent bacteriocidal 
enzymes that degrade the cell wall when delivered into competing 
bacterial cells by the type VI secretion system’. Here we show that tae 
genes have been transferred to eukaryotes on at least six occasions, 
and that the resulting domesticated amidase effector (dae) genes have 
been preserved for hundreds of millions of years through purifying 
selection. We show that the dae genes acquired eukaryotic secretion 
signals, are expressed within recipient organisms, and encode active 
antibacterial toxins that possess substrate specificity matching extant 
Tae proteins of the same lineage. Finally, we show that a dae gene 
in the deer tick Ixodes scapularis limits proliferation of Borrelia 
burgdorferi, the aetiologic agent of Lyme disease. Our work demon- 
strates that a family of horizontally acquired toxins honed to mediate 
interbacterial antagonism confers previously undescribed antibac- 
terial capacity to eukaryotes. We speculate that the selective pressure 
imposed by competition between bacteria has produced a reservoir 
of genes encoding diverse antimicrobial functions that are tailored 
for co-option by eukaryotic innate immune systems. 

Eukaryotes can acquire new functions through the exchange of genetic 
material with other domains of life’. Indeed, bacteria-to-eukarya hori- 
zontal gene transfer (HGT) underlies the adaptation and diversifica- 
tion of many microbial eukaryotes, such as algae, choanoflagellates and 
protozoa**. The acquisition of bacterial genes by metazoans is rare. 
Among the transferred genes, many are not expressed and have no known 
function’, while others have roles in endosymbiont maintenance’®. Rela- 
tively few reports provide evidence of transferred elements that confer 
traits which are directly beneficial to their metazoan recipients’. One 
recent example is the discovery that phytophagous mites and Lepidop- 
tera species exploit a horizontally acquired bacterial cysteine synthase 
to feed on plants producing cyanogenic defence compounds”. 

Genes that can independently provide new functionality to a recipient 
organism are strong candidates for domestication after HGT°”°. The Tae 
proteins are small, single-domain enzymes that can rapidly digest the 
bacterial cell wall’. These proteins comprise four phylogenetically dis- 
tinct families (Tae1-4) that share no overall sequence homology and 
display unique specificities against peptidoglycan (PG)’””. In the course 
of probing tae distribution, we made the serendipitous observation 


that homologues are found in distantly related eukaryotic genomic and 
expression data sets ranging from unicellular protozoa to multicellular 
metazoans (Fig. 1a). The genes did not appear to derive from contami- 
nating bacterial DNA; most contain introns and are located in genomic 
regions flanked by eukaryotic genes (Extended Data Fig. 1). We there- 
fore refer to these eukaryotic loci as domesticated amidase effector (dae) 
genes, and hypothesized that they encode antibacterial toxins horizon- 
tally acquired from bacteria. Maximum likelihood and Bayesian phylo- 
genetic analyses revealed that trees of bacterial tae2-4 families each 
contained two distinct monophyletic clades of eukaryotic dae genes 
(Fig. 1b and Extended Data Figs 2-4). Thus, we conclude that three of 
the four known tae gene families have been acquired by eukaryotes from 
diverse bacteria in at least six HGT events (Fig. 1a). Our survey is biased 
by the status of genome sequencing efforts; therefore, these six instances 
are probably an underestimate of eukaryotic tae acquisitions. 

Three of the dae genes we found are limited to individual or closely 
related eukaryotes (Fig. 1a, light green, light blue and dark blue). These 
could represent recent HGT events, or reflect limited genomic and tran- 
scriptomic sampling of related species. The remaining three dae genes 
appear to be the result of ancient HGT events. For instance, we found 
dae2 in ten species of ticks and mites (Fig. 1b). This dense sampling, a 
shared intron between the dae2 genome sequence of I. scapularis and 
Metaseiulus occidentalis, and the fact that the tick and mite dae2 gene 
phylogeny closely resembles the established phylogeny of these organisms, 
lead us to conclude that vertical transmission followed a single HGT 
event ofa bacterial tae2 gene to the common ancestor of ticks and mites 
approximately 400 million years (Myr) ago (Figs 1b-d and Extended 
Data Figs 1, 5a, b)'*. The complete genome sequence of the Acariform 
mite Tetranychus urticae does not possess dae2, indicating that loss of 
the gene has also occurred. Partial dae2 sequences in the genomes of 
two scorpion species and the horseshoe crab share an intron position 
with dae2 from ticks and mites, suggesting that dae2 introduction into 
arthropods may have occurred as early as 550 Myr ago (Extended Data 
Fig. 5c). Similarly, dense sampling of dae4 genes in gastropod and bivalve 
mollusks, as well as a shared dae4 intron position across all sampled 
mollusks and an annelid, dates the origin of dae4 in these animals to 
at least 400 Myr ago (Fig. 1a, light red, and Extended Data Figs 1, 4)*. 
Finally, a second dae4 present in a species of choanoflagellates, sea anem- 
ones, acorn worms and lancelets is most parsimoniously explained by a 
single HGT event followed by vertical inheritance and loss in multiple 
lineages, dating this dae4 acquisition to before the base of the metazoan 
lineage (>800 Myr ago) (Fig. la, dark red, and Extended Data Fig. 4). 
However, owing to sparse sampling and lack of evidence of shared synteny, 
we cannot rule out more recent HGT to and between these eukaryotic 
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Figure 1 | Recurrent horizontal gene transfer of tae genes into diverse 
eukaryotic lineages. a, Schematized phylogenetic tree of basal eukaryotic 
lineages”” showing instances of tae transfer (arrows) from bacteria to 
eukaryotes, coded by colour (tae family) and shading (acquisition events). 

b, Maximum likelihood phylogenetic tree of tae2 and dae2 genes. 
Representatives are boxed and colour-coded according to a. Branch support 
>0.7 indicated by asterisks or numbers. Scale bar shows estimated divergence 


lineages*. In summary, we find compelling evidence that at least two 
animal lineages have retained a bacterially derived antibacterial gene for 
hundreds of millions of years. 

Several lines of evidence led us to hypothesize that dae genes provide 
an adaptive function to their eukaryotic hosts. We found strong signa- 
tures of purifying selection acting on dae2 and dae4 genes (Extended 
Data Table 1). Additionally, eukaryotic Sec signals were identified in the 
majority of Dae proteins, including representatives from each of the 
predicted HGT events (Extended Data Fig. 6). Secretion of bacterial Tae 
proteins occurs through the Sec-independent type VI secretion system 
(T6SS); thus, acquisition of a Sec signal is indicative of functional spe- 
cialization involving export from eukaryotic cells. Finally, the majority 
of Dae proteins possess the cysteine-histidine catalytic dyad and flank- 
ing motifs of their corresponding Tae families, consistent with retention 
of enzymatic activity (Extended Data Fig. 6). 

We next sought evidence for expression of eukaryotic dae homologues 
belonging to each of the transferred bacterial tae families. We found dae2 
expression during both the unfed nymphal and unfed adult life stages 
of the hard tick I. scapularis, with levels significantly elevated in adults 
(Fig. 2a). In the amoeba Naegleria gruberi, we observed a basal level of 
expression of each of the three dae3 homologues in trophozoite (amoeba) 
cells, which increased during differentiation into flagellates (Fig. 2b). 
A published expression profile of the lancelet Branchiostoma floridae 
indicates that expression of dae4 is enriched at the neurula stage of 
development’>. Together, these data strongly support the hypothesis that 
dae genes have been functionally integrated into recipient physiology. 

The Tae families display unique specificities against PG. Within PG 
typified by Gram-negative Proteobacteria, enzymes from families 1 and 4 


in amino acid changes per residue. Dashed lines highlight separate HGT events. 
c, Schematic alignment of tick (I. scapularis) and mite (M. occidentalis) dae2 
genes with shared (red line, asterisk) and unique (vertical lines) intron positions 
denoted. Aligned residues surrounding the splice site are shown (boxed) with 
conserved amino acids indicated (grey). d, Tick and mite phylogeny with 
approximate dates of divergence based on concordance with the dae2 gene 
tree (c)?. 


cleave at the y-D-glutamyl-meso-diaminopimelic acid (mDAP) bond, 
whereas those from families 2 and 3 cleave the mDAP-pD-alanine bond 
crosslinking the peptide stems (Fig. 2c)*’*'®. To test whether Dae pro- 
teins can hydrolyse PG, we incubated purified Dae2, Dae3 and Dae4 
representatives from I. scapularis, N. gruberiand B. floridae, respectively, 
with isolated Escherichia coli PG sacculi. High-performance liquid chro- 
matography (HPLC) analysis of reaction products demonstrated that 
each of the enzymes hydrolyses PG (Fig. 2d, e). Remarkably, Dae2, Dae3 
and Dae4 show substrate specificity matching that of the characterized 
extant Tae homologues within corresponding families (Fig. 2c). These 
data support the hypothesis that dae homologues, derived from three 
tae families, have been retained in eukaryotic genomes due to their PG 
amidase activity. We did not find evidence supporting the transfer of 
housekeeping bacterial amidases to these organisms, leading us to spec- 
ulate that genes encoding T6S effectors—enzymes that intoxicate recipient 
cells at exceedingly low concentrations—might be especially amenable 
to preservation after HGT””. 

Within eukaryotes, enzymes with PG-degrading activity might have 
immunoregulatory roles, or act directly as antibacterial factors like the 
Tae toxins'®. To explore the functional significance of a domesticated 
tae gene, we focused on dae2 from the deer tick I. scapularis, an impor- 
tant vector for numerous diseases, including Lyme borreliosis and 
anaplasmosis"’. Western blot analysis of adult I. scapularis demonstrated 
that Dae2 is present in the salivary glands and midgut (Fig. 3a). I. scapularis 
is an ectoparasite that requires a blood meal for life-stage transitions; 
pathogens are typically acquired during feeding and transmitted to a new 
host at the next blood meal. Accordingly, the midgut and salivary glands 
interface with bacterial pathogens and influence their transmission”. 
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To understand how Dae2 could contribute to innate bacterial defence 
within I. scapularis, we tested its capacity to cleave diverse PG structures 
representative of bacteria the organism encounters in the environment”. 
Consistent with its ability to degrade E. coli PG, we found that Dae2 
degrades a related form of the cell wall present in Firmicutes belonging 
to the class Bacilli (Extended Data Fig. 7a)'°. We did not detect cleavage 
of the lysine-type PG found in Streptococcus pneumoniae, which repre- 
sents the second major PG type found in Firmicutes (Extended Data 
Fig. 7b). Although the ultrastructure of the B. burgdorferi sacculus is not 
well defined, its amino acid composition appears to differ from that of 
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Figure 2 | Eukaryotic dae genes encode 

1 differentially expressed PG amidases with 

i conserved specificity. a, b, Expression profile of I. 
| scapularis and N. gruberi dae genes at the indicated 
| _ life stages as measured by polymerase chain 

; reaction with quantitative reverse transcription 

| (qRT-PCR). Levels of each N. gruberi dae3 gene 

! (dae3.1-3.3) were determined. Error bars show 

+ standard deviation (s.d.), n = 3. c, Schematic 
representation of typical Gram-negative PG 
showing cleavage sites (red lines) for Tae and Dae 
families 2-4 (colours correspond to Fig. 1a). 

d, Partial HPLC chromatograms of E. coli PG 
sacculi products resulting from incubation with 
buffer (control), wild-type (WT) and catalytically 
inactive (C43A, C79A, C89A) Dae enzymes 

and cellosyl. e, Major HPLC peaks assigned 
previously by mass spectrometry correspond to 
disaccharide-linked tetrapeptide (i), pentapeptide 
(ii), tetrapeptide-tetrapeptide (iii), pentapeptide— 
tetrapeptide (iv) and dipeptide (v)’. 


Control 
aliv) 


60 
Elution time (min) 


well-characterized bacterial cell walls”. Incubation of B. burgdorferi sac- 
culi with Dae2 led to the accumulation of specific enzymatic degradation 
products, indicating that the cell wall of this organism is also a substrate 
of the amidase (Extended Data Fig. 8). 

The Dae proteins are reminiscent of an evolutionarily conserved group 
of bacteriophage-related eukaryotic innate immune amidases, the PG 
recognition proteins (PGRPs)'*. Some PGRPs are directly bacteriocidal 
and act by hydrolysing PG, whereas others exert antibacterial activity 
through alternative mechanisms”. We found that exogenous Dae? is 
not toxic to intact E. coli cells. By contrast, Dae2, but not a catalytically 
inactive variant of the enzyme (C43A), administered to outer-membrane- 
permeabilized E. coli or targeted to the periplasm via the Sec pathway, 
is highly lytic (Fig. 3b-d). Moreover, exogenous Dae? is bacteriocidal 
against B. subtilis, which has cell-surface-exposed PG (Fig. 3d). Together, 
these results strongly suggest that Dae2-dependent antibiosis is solely 
the result of its amidase activity and that the enzyme would require outer 
membrane permeabilizing agents such as antimicrobial peptides to act 
in vivo. 

B. burgdorferi is the causative agent of Lyme disease, the most preva- 
lent vector-borne illness in the United States**. Given the antibacterial 


Figure 3 | Dae2 is a bacteriolytic toxin that restricts the proliferation of 
B. burgdorferi in the tick I. scapularis. a, Western blot analysis of Dae2 in 
unfed adult and nymphal total tissue (total), midgut (MG), salivary gland (SG) 
and haemolymph (HL) extracts from I. scapularis. Recombinant Dae2 
protein (RC) and tissue from a closely related species, Dermacentor variablis 
(control), were included. Actin levels were examined as a loading control. 

b, Lytic activity of lysozyme (Lys) and Dae2 (wild type (WT), C43A) proteins 
against permeabilized E. coli. Error bars show + s.d., n = 3. ND, not detected. 
c, Growth of E. coli expressing native (cyto-) or periplasm-targeted (peri-) 
Dae2 proteins. OD¢00 nm» Optical density at 600 nm. Error bars show = s.d., 

n = 3. d, Bacterial killing activity of indicated proteins against B. subtilis (Bs) 
and E. coli (Ec) cells. c.f.u., colony-forming units. Error bars show = s.d., n = 3. 
e, Dae2 transcript levels quantified by qRT-PCR in RNAi-treated engorged 
ticks. f, At 2 weeks post-engorgment, spirochaete levels were quantified in 
ticks that had received the indicated RNAi treatments, using qPCR analysis 
of flaB, a B. burgdorferi-specific gene, and normalized to TROSPA, a tick- 
specific gene. n = 20. Each data point in e and f represents three nymphs. 
Horizontal bars represent mean values, which were significantly different in a 
two-tailed nonparametric Mann-Whitney test (P < 0.05). 
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activity of Dae2 (Fig. 3b-d), its ability to cleave B. burgdorferi PG in vitro 
(Extended Data Fig. 8), and its localization to sites that interface with 
bacteria (Fig. 3a), we hypothesized that Dae2 could have a role in 
regulating B. burgdorferi populations in I. scapularis. We tested this 
possibility using RNA interference (RNAi)-mediated knockdown of 
dae2 (Fig. 3e). RNAi-treated nymphal ticks were fed to repletion on 
B. burgdorferi-infected mice, and spirochaete load was assessed at en- 
gorgement and again after 2 weeks. At repletion, we observed no de- 
tectable difference in B. burgdorferi levels in control and experimental 
RNAi-treated ticks, indicating that Dae2 activity does not limit initial 
acquisition of the bacterium (Extended Data Fig. 9a). By contrast, at 2 
weeks post-engorgement, B. burgdorferi levels were significantly ele- 
vated in the dae2 knockdown group (Fig. 3f). The effect of Dae2 disrup- 
tion on B. burgdorferi levels is unlikely to be due to variations in tick 
feeding or general fitness, as we observed no difference between the groups 
in engorgement weights at either time point (Extended Data Fig. 9b). 
Furthermore, overall bacterial load was similar between the groups, sug- 
gesting that the increase in B. burgdorferi did not result from gross 
changes in populations of tick-associated microbes (Extended Data 
Fig. 9c). The ability of Dae2 to act on a wide range of bacterial cell walls 
leaves open the possibility that compositional changes to the tick micro- 
biome may contribute to the effect of the knockdown on B. burgdorferi’. 
On the basis of these findings, we conclude that Dae2 contributes to the 
innate ability of I. scapularis to control B. burgdorferi levels after its acqui- 
sition. This has potential ramifications for Lyme disease transmission, 
as spirochaete load in the tick can influence transmission efficiency”*. 

We demonstrate that bacterial genes encoding antibacterial effectors 
of the T6SS have been horizontally transferred to diverse eukaryotes. 
The recurrent and independent transfer of tae genes to distinct eukar- 
yotic lineages suggests that these toxins can confer immediate fitness 
benefits by supplying new function to the innate immune system’®. Recent 
studies have revealed that the number and diversity of factors that medi- 
ate interbacterial antagonism is greater than once appreciated. Thus, 
we speculate that competition between bacteria generates a reservoir of 
genes—beyond the tae superfamily—with the potential to confer anti- 
microbial capacity to eukaryotes upon acquisition. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Computational searches. Homologues of tae were searched for using iterative 
PSI-BLAST™. First, bacterial homologues were assembled using PSI-BLAST searches 
limited to bacterial sequences in the non-redundant (NR) protein database. Sequences 
with e-values <1 X 10°!” (for tae2) or <1 X 10° ”° (for tael, tae3 and tae4) and 
greater than 50% query coverage were included in successive rounds until no new 
homologues were identified. Position-specific scoring matrix (PSSM) was used to 
query the NR database limited to eukaryotic sequences. Eukaryotic homologues 
with e-values < 1 X 107° were used to initiate an iterative PSI-BLAST search for 
eukaryotic proteins, as described earlier (e-value cut-off 1 X 10” °). Eukaryotic homo- 
logues were validated by the presence of introns or flanking eukaryotic genes. To 
validate the Oxytricha trifallax dae3 gene identified in the macronucleus genome, 
we searched the unpublished micronucleus genome (http://oxy.ciliate.org/blast/) 
for evidence of a fragmented dae3 gene that would be consistent with gene rear- 
rangement in this species**. Expressed sequence tag (EST), whole-genome sequenc- 
ing (WGS) and transcriptome databases were searched with tBlastN” using validated 
dae genes. We acknowledge the deposition of unpublished data into these data- 
bases from multiple sources, including Baylor College of Medicine Human Genome 
Sequencing Center (https://www.hgsc.bcm.edu), The Genome Institute at Washington 
University School of Medicine (http://genome.wustl.edu), the US Geological Survey 
(http://www.usgs.gov), the Functional Genomics Center Zurich (http://www.fgcz. 
ch), the Joint Genome Institute (http://jgi.doe.gov) and the Broad Institute (http:// 
www.broadinstitute.org). Hits from EST or transcriptome databases were accepted 
in cases where the hit was more closely related to a validated dae gene than a bac- 
terial tae gene. When gene predictions based on genomic sequences differed from 
experimental data from EST or transcriptome databases, we used experimental data 
to confirm or modify the predicted protein sequence. For instance, the predicted 
dae2 gene from I. scapularis (NCBI protein database accession gi|242000170) lacks 
a secretion signal, whereas the sequence from EST data (NCBI EST database acces- 
sion gi| 156264544) differs from the sequence in the protein database in the first exon, 
resulting in a strongly predicted secretion signal similar to the other tick sequences. 
Phylogenetic and evolutionary analysis. Bacterial and eukaryotic sequences were 
aligned using MUSCLE” and edited using Geneious*’. Regions encompassing the 
catalytic domain were used for phylogenetic analyses; sequences with >98% iden- 
tity were excluded. The best-fitting evolutionary model was determined by Prottest*. 
Maximum likelihood phylogenetic trees were generated with PhyML* using 500 
bootstrap replicates. To validate phylogenetic inferences, Bayesian Markov chain 
Monte Carlo (MCMC) analyses were performed in MrBayes™, sampling every 500 
generations until the standard deviation of split frequencies was <0.01 or 10° 
generations were sampled. Tests for purifying selection were performed on aligned 
and degapped nucleotide sequences of dae or tae genes. Whole gene non-synonymous/ 
synonymous (dN/dS) ratio calculations, as well as statistical tests for purifying or 
positive selection for individual codons, were performed using SLAC in the HyPhy 
software suite**. Additional statistical tests in the HyPhy software suite (REL and 
FUBAR) confirmed that several tae and dae codons display statistically significant 
signatures of purifying selection. No codons demonstrate signatures of positive 
selection. N-terminal eukaryotic secretion signals were predicted using SignalP*° 
using default cut-off values. Sequence logos were constructed using Geneious. 
DNA libraries. For N. gruberi cDNA libraries, strain NEG was grown on Klebsiella 
and differentiated using standard protocols”. Synchrony was estimated by per- 
centage of flagellates after fixing in Lugol’s iodine (n > 100 per time point)**. 10” 
cells were harvested per sample. For I. scapularis cDNA libraries, ticks from the 
Tick-Rearing Center at Oklahoma State University were homogenized by grinding 
in liquid nitrogen. RNA and DNA was purified from I. scapularis and N. gruberi 
samples with TRIzol reagent (Invitrogen) according to the manufacturer’s instruc- 
tions. Contaminating genomic DNA in RNA samples was removed by treatment 
with Turbo DNase (Invitrogen) for 1 h at 37 °C, followed bya second TRIzol purifi- 
cation. DNA contamination was checked by PCR using actin- or GAPDH-specific 
primers for I. scapularis and N. gruberi, respectively. cDNA libraries were synthe- 
sized using the iScript cDNA synthesis kit (Biorad). 

Expression of Dae proteins. The codon-optimized dae genes from I. scapularis 
(dae2), N. gruberi (dae3) and B. floridae (dae4) with predicted signal sequences 
removed were synthesized by GenScript and cloned into the pHis-sumo express- 
ion vector. Shuffle T7 pLysS cells were transformed with plasmids, and expression 
was induced at an optical density (OD¢00 nm) of 0.6 with 0.1 mM isopropyl-B-p- 
thiogalactopyranoside for 20 h at 18 °C. Cells resuspended in 20 mM HEPES pH 
7.5, 0.5 M NaCl, 25 mM imidazole were lysed by sonication. Lysate was cleared by 
centrifugation for 1 h at 18,000g, and proteins were purified with a metal-chelating 
affinity column. The tag was proteolytically removed with the H3C and separated 
from proteins using a second affinity column and size exclusion chromatography 
(GE Healthcare). 

Sacculus analysis. PG sacculi from E. coli D456 (AdacA AdacB AdacC) were purified 
as previously described*””. Preparations (300 1g) were incubated with Dae2 (1 1M), 


Dae3 (10 [1M) or Dae4 (10 11M) in 300 tl of 20 mM HEPES pH 7.5, 100 mM NaCl 
for 4h at 37 °C. PG sacculi from B. subtilis 168 (300 ,1g) or from S. pneumoniae R6 
(120 pg) were incubated with Dae2 (6 1M) for 4h at 37°C. The samples were 
digested with cellosyl, reduced and analysed by HPLC using published methods”. 
For preparations from B. burgdorferi, the B31-MI-16 strain, an infectious clone of 
the sequenced type strain B31, was cultured at 34 °C to early mid-log exponential 
growth’. Cultures were chilled on ice for 10 min and gently harvested by cen- 
trifugation at 3,250g for 15 min. Pelleted cells were washed three times and resus- 
pended in cold PBS. Cell suspensions were added, drop-wise, to 6 ml of 8% SDS 
and boiled for 30 min. PG was prepared, incubated with Dae2 (1 1M) for 4h at 
37 °C, and analysed as previously described”, with the exception of HPLC buffer 
B, which contained 30% methanol. 

Western blot analysis. Tissues were dissected from I. scapularis ticks purchased 
from the Tick-Rearing Center at Oklahoma State University. A rabbit polyclonal anti- 
body specific for I. scapularis Dae2 was generated by GenScript using a synthetic 
peptide corresponding to Dae2 amino acids 123-136 (RYGNTGKPNYNGDN, 
Lot #195690-4). Mouse anti-actin antibody from Abcam (GR14272-8) and anti- 
rabbit (A6154) and anti-mouse (A4416) horseradish peroxidase (HRP)-conjugated 
secondary antibodies from Sigma were used. Western blot analyses and imaging 
were performed as previously described“*. Four replicate analyses of tissues were 
performed; a representative blot is shown in Fig. 3a. 

Growth curves. E. coli growth curves were generated as previously described'’. The 
vector pSCHRAB2 was used for expression of cytoplasmic I. scapularis Dae2, and 
the pSCRHAB2 vector with a pelB leader sequence inserted was used for expression 
of periplasmic Dae2. Curves are representative of three biological experiments and 
contain technical triplicates. 

Lysis assays. Assays were performed as previously described”. E. coli reactions were 
carried out at enzyme concentrations of 1 uM; B. subtilis reactions were carried out 
at concentrations of 1 [1M (lysozyme) and 50 IM (Dae2). Curves are representative 
of three biological experiments and contain technical triplicates. 

Bacterial killing assays. Colonies of E. coli or B. subtilis cells grown on solid LB 
media were washed in 0.2 PBS pH 6 and resuspended to an OD¢o0 nm of 0.1 and 
0.01, respectively. Cells were incubated with recombinant Dae2 enzyme (wild type 
or C43A) at room temperature for 3 h, and serial dilutions were plated on solid LB. 
Viability was quantified by enumeration of colony-forming units. Curves contain 
four technical replicates. 

Mouse and RNAi experiments. All animal experiments and tick protocols were 
approved by the Institutional Animal Care and Use Committee at Indiana Univer- 
sity. The low-passage, virulent B. burgdorferi strain 5A4NP1, a derivative of B31- 
ML, was a gift from H. Kawabata and S. Norris, University of Texas Health Science 
Center. The strain was cultivated in Barbour-Stoenner-Kelly (BSK-II) medium 
supplemented with 6% normal rabbit serum (Pel Freez Biologicals) at 35 °C with 
5% CO>. Kanamycin was added to the culture at 300 1g ml” *. The mouse feeding 
experiments were conducted in the Vector-borne Diseases Laboratory at Indiana 
University School of Medicine. Briefly, 4-week-old C3H/HeN mice were needle- 
infected with B. burgdorferi (10° spirochaetes per mouse). Two weeks post-inoculation, 
mouse infection was confirmed by cultivation of ear-punch biopsy specimens to 
assess spirochaete growth. A single growth-positive culture was used as the criterion 
for infection of each mouse. 

RNAi in nymphal ticks was performed using previously described protocols**. 
To generate double-stranded RNA (dsRNA), 374 bp of I. scapularis dae2 and 356 bp 
of the green fluorescent protein gene (gfp) were amplified using the following 
primers containing the T7 promoter: gfp_RNAi_F, GAGCTCTAATACGACTC 
ACTATAGGGAGAGTGTGAGTTATAGTTGTATTCCAAT; gfp_ RNAi _R, GG 
TACCTAATACGACTCACTATAGGGAGAGTGGAGAGGGTGAAGGTGATG 
CAAC; dae2_RNAi_F, CTAGTCGAGCTCTAATACGACTCACTATAGGGA 
GACGCTCGTGGTCCTGGGAT; dae2_RNAi_R, CTAGTCGGTACCTAATAC 
GACTCACTATAGGGAGAGTTGTAGTTGGGCTTCCCTGTA. dsRNA was syn- 
thesized and purified from PCR products using a commercial kit (Megascript RNAi 
Kit, Ambion), and resuspended into elution buffer (10 mM Tris-HCl pH 7, 1 mM 
EDTA), aliquoted, and stored at —20 °C until further use. 

RNAi experiments were performed on a randomized pool of nymphs reared 
from three engorged female ticks collected from the wild. Five microlitres of the 
dae2 or gfp dsRNA (3 jg ul ') was loaded into capillary tubes, and 0.5 pl dsRNA 
was microinjected into the gut of each unfed nymph, as recently described”. Micro- 
injected ticks were allowed to rest in a temperature-controlled humidity chamber 
for 16h and ~100 nymphs were subsequently fed on B. burgdorferi-infected female 
4-6-week-old C3H/HeN mice. Two mice were included per RNAi treatment to 
account for potential variability in B. burgdorferi infection loads. Ticks were allowed 
to feed to repletion (3-5 days) and collected within 24h (t= 0). Fed ticks were 
maintained in a temperature-controlled incubator until the indicated time point 
(2 weeks). Knockdown efficiency was analysed by RT-PCR analysis of dae2 levels 
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in engorged nymphs. The RNAi treatment groups were blinded from the time of 
dsRNA injections through qPCR analyses. 

qPCR analyses. qPCR was performed on cDNA samples using the SsoAdvanced 
Universal SYBR Green Supermix (Biorad). Expression levels for dae genes were nor- 
malized to actin or gapdh expression levels in I. scapularis and N. gruberi, respec- 
tively. Analyses of dae gene expression include three technical replicates for N. gruberi 
and technical duplicates of three biological replicates for I. scapularis. Populations 
of B. burgdorferi and total bacteria were quantified by qPCR in tick DNA samples 
using primers targeted to flaB (B. burgdorferi-specific), the 16S rRNA gene”, 
or TROSPA (tick-specific). Biological replicates are shown for qPCR analyses of 
B. burgdorferi and total bacterial levels. Transcript or DNA copy numbers were 
calculated using a standard curve. 
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Extended Data Figure 1 | Genomic evidence for validated eukaryotic dae 
genes. Eukaryotic dae genes from the indicated organisms are listed adjacent 
to schematic representations of available predicted open reading frames 
(colour-coded according to family as in Fig. 1a) and corresponding genomic 
context of dae genes. Flanking genes are colour-coded according to organisms 
that homologues of these genes are found in (broadly, in eukaryotes, black; 
only closely related eukaryotic species, grey; both bacteria and eukaryotes, 
white). Diagonal lines denote ends of genomic contigs. In the right column, 
splice sites (red vertical lines) and conserved intron positions (red dashed 
circles) are shown. In Oxytricha trifallax, the somatic nucleus (macronucleus) 


contains ~16,000 chromosomes and is a rearranged form of the germline 
nucleus (micronucleus)”*. The complete dae3 gene in Oxytricha is found in the 
macronucleus on a chromosome with three characteristic GGGGTTTT 
telomere sequences. Three fragments comprising the dae3 gene are found in the 
micronuclear genome (http://oxy.ciliate.org/). In Nematostella vectensis and 
Branchiostoma floridae, lineage-specific duplication events have resulted in two 
adjacent dae4 paralogues with gene names labelled (numbers). In Capitella 
teleta and Lottia gigantea, shared synteny on both sides of the dae4 gene is 
indicated (red dashed circles). 
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Extended Data Figure 2 | Phylogenetic tree of bacterial tae2 and eukaroytic In both phylogenetic trees, the two eukaryotic dae2 clades are well supported as 
dae2 genes. A phylogenetic tree was constructed using Bayesian methods in monophyletic clades, supporting our conclusion of two HGT events. Likewise, 
MrBayes™ to compare to the maximum likelihood tree shown in Fig. 1b. many major bacterial groups are well supported in both trees. Differences in 
Branch support >0.7 is indicated by asterisks or by numbers. The scale bar the overall topology of the trees, mostly owing to changes in deep branches that 
shows estimated divergence in amino acid changes per residue. Eukaryotic dae2 _are not well supported in either phylogenetic tree, reflect uncertainty in the 
genes are indicated by dashed boxes, which highlight two separate HGT events. _ ancient history of these genes and should therefore be treated with caution. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 


Other bacterial tae3 genes 


Naegleria gruberi 
0.87 Naegleria gruberi 
Naegleria gruberi 


= <| Ralstonia 


* 


Enterobacteriales 
Enterobacteriales 


———— J Enterobacteriales 


Delftia sp. 
Burkholderia phytofirmans 


0.91 * 


Burkholderia sp. 
~-¢] Pseudomonas 


* 
<—_| Ralstonia 


* Ralstonia solanacearum 
Xanthomonas oryzae 


————__| Pseudomonadales 
0.66 ——=—__ | Bacteroidetes 
- — | Bacteroidetes 
Prevotella marshii 
Prevotella denticola 
Stigmatella aurantiaca 


(| Burkholderia 
Proteiniphilum acetatigenes 


<~_—————_| Bacteroidetes 


0.2 


Naegleria gruberi 


1.0 Naegleria gruberi 


Naegleria gruberi 


Delftia sp. 


Burkholderia 


Ralstonia 
Pseudomonas 
Desulfovibrio desulfuricans 
Bacteroidetes 


Burkholderia 


Stigmatella aurantiaca 


Bacteroidetes 


Oxytricha iriellax 3 


Other bacterial tae3 genes 


0.40 } 
Bacteroidetes 


0.2 


Enterobacteriales 


Enterobacteriales 


Enterobacteriales 


eukaryotic dae3 clades are well supported as monophyletic clades, supporting 
our conclusion of two separate HGT events. Likewise, many major bacterial 
groups are well supported in both trees. Differences in the overall topology 
of the trees, mostly owing to changes in deep branches that are not well 
supported in either phylogenetic tree, reflect uncertainty in the ancient history 
of these genes and should therefore be treated with caution. 


Extended Data Figure 3 | Phylogenetic tree of bacterial tae3 and eukaryotic 
dae3 genes. a, b, Phylogenetic trees were constructed using either maximum 
likelihood methods (a) or Bayesian methods (b). Branch support >0.7 is 
indicated by asterisks or by numbers. The scale bar shows estimated divergence 
in amino acid changes per residue. Eukaryotic dae3 genes are indicated by 
dashed boxes, which highlight two separate HGT events. In both trees, the two 
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Extended Data Figure 4 | Phylogenetic tree of bacterial tae4and eukaryotic — eukaryotic dae4 clades are well supported as monophyletic clades, supporting 
dae4 genes. a, b, Phylogenetic trees were constructed using either maximum __ our conclusion of two separate HGT events. Likewise, many major bacterial 


likelihood methods (a) or Bayesian methods (b). Branch support >0.7 is 


groups are well supported in both trees. Differences in the overall topology of 


indicated by asterisks or by numbers. The scale bar shows estimated divergence _ the trees, mostly owing to changes in deep branches that are not well supported 
in amino acid changes per residue. Eukaryotic dae4 genes are indicated by in either phylogenetic tree, reflect uncertainty in the ancient history of these 
dashed boxes, which highlight two separate HGT events. In both trees, the two _ genes and should therefore be treated with caution. 
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Extended Data Figure 5 | Evidence for dae2 in other chelicerates. and is nearby the mite intron position. b, Phylogenetic tree based on partial 
a, Alignment of Dae2 from ticks and mites (I. scapularis and M. occidentalis) nucleotide sequences of dae2 from the indicated chelicerate species. Scale bar 
with Dae2 sequences from partially assembled genomes of two scorpions shows estimated divergence, in substitutions per nucleotide. c, Chelicerate 
(Mesobuthus martensii and Centruroides exilicauda) and the horsehoe crab phylogeny with approximate dates of divergence’. The unknown divergence 
(Limulus polyphemus). Splice junctions are denoted (vertical red lines). All time of sarcoptiform and trombidiform mites is indicated by a question mark. 
three alignable partial sequences start (red diagonal slashes) in the same We find no evidence for dae2 in the complete genome of the trombidiform 


position as the shared splice site in tick and mite dae2 genes, suggesting that this | mite Tetranychus urticae nor in the partial (several species) or complete 
is probably the beginning of the exons in all dae genes shown. A second intron genome (Stegodyphus mimosarum) of any spider. Putative dae2 gene loss 
position is shared between the tick, scorpion and horseshoe crab dae genes events in trombidiform mites and spiders are denoted (dashed lines). 
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N-terminus Catalytic dyad 
* * 
Tae2 consensus 0 Vabye HANI 
A. cajennense MKGFVICAALLVILGMAVGSSOAI KEGVALVK HAAIFES 
A. triste MNGFLLCAALLVLGMAVGSOAI KEGVALVK (#AAIFES 
A.parvum MKCFVTCTALLVILGMAVGSOAI KEG@VALVK (HAAIFES 
A. maculatum MNGFILSAALLVLGMAVGSOAI KEG@VALVK HAAIFES 
R. pulchellus MKGFVVSVGLULLGMATVIQAT REGVALVK AAIFES 
R. sanguineus MKGFVISVGLLLLGMTVVTQAI KEG@VALVK HAAIFES 
Dae2 |. scapularis MKMGVPTPLVLVVSTAPSLNCEVDAI KEGVALVK AAIFES 
M. occidentalis MKLUFUISAALVVEGLAAVSDAM GSAAALIK |#AGIFVR 
D. gallinae MSPSTVTGALLALLAFTVGVSGmM GSAAGLVK  MHAAILVR 
D. pteronyssinus MYSFKYCLAAATLILVGNVCOAF QE@AALVQ (HTAIFVR 
D.magna MKFFIVTVFVITFIDOSMGG GEGVSLYK HVAIYVG 
D. pulex MGSSMKLFTFTFVLLALIELGFGG GEG@VSLYK HHVALYVG 
D. pulex MAKLLLLLFLAATLKSSFGA GEG@VSFYK HHTALYVG 
D.pulex MPLCEPSKHVGTLLADPWGGQ GEGVSFVK HAAVYMG 
* * 
Tae3 consensus ge( AVR Hef fag 
N. gruberi MSQVKTVILLLLIALVSAV GKGAKYTA HPDGHMOQM 
N. gruberi MHSSKLIIAVLLMVIASCVVVSSR GH@ARAVR HPHGHIQI 
Dae3 N. gruberi MHKKTLILGVALLLFALFAVVARAA GY@ARAVR HPHGHMOQV 
O. trifallax MLSKALAFGALALTVSAD GL@AKYVR HIHGHITV 
* * 
Tae4 consensus \aCAskus z' TUN 
B. floridae MKATVWLVVVLEFACVWNESSAW NTGAMRVS TGHVDLWD 
B. floridae MLKTTVWFAAVLFACVWHGSSAW NT@TMRVS TGHVDLWD 
B. floridae MRPRPGWPKFSELKSNYPSY NTGAMRVS TGHVDLWD 
B. floridae MEKKLLLCLYLLEATPVATAQO NTGAMRVS TGHVDLWD 
S. kowalevskii MSTWPSFEELWENYPNYRDW NTGAIRLS TGHVDLWD 
N. vectensis MNCIIQLLELFCVIGHISHT NTYAIRLS TGHVDLYD 
N. vectensis MDLVLSLHLELSVYEVNFAI NT@AIRLS TGHVDLYD 
M. brevicolls |§MALNFDKMWQDPVTAMDHAS NY@AIRLS TGHVTLWN 
Dae4 
V. lienosa MNHAFMRRWTATLLLLAATPAKSE DTSALRLS KGHVALWD 
E. complanata MNHAFMRRWTATLLLLAATPAKSE DTSALRLS KGHVALWD 
L. gigantea MIFKVLLVVLYCTLSVIAD DTSALRLS KGHVALWD 
L. stagnalis MKVLVTLCVILTKCAVTRGE DTSALRLS KGHVALWD 
A. californica MKQGLLLFVLLTGVLTPTKAY DTSALRLS KGHVALWD 
S. constricta MAAEQGLTSVALRDALTLRQ DTSALRLS KGHVALWD 
E. complanata MELLYFTIFFVLGDLVSAL DTSALRLS VGHVALWD 
M. galloprovincialis MELHSTILVILLIAEYVVGD DTSALRLS RGHIVLWD 
C.gigas MDYLKCLPVLLSCAIASLSETV DTSALRMS KGHIALWN 
C. teleta MVFSLDDAVLLLLTTLAACWTTTTAF NTAPMRMS SGHMGLWD 


Extended Data Figure 6 | Evidence for retention of important catalytic 
motifs and recurrent eukaroytic-specific addition of secretion signals. 
a-c, Alignments for the predicted Dae N-terminal signal sequences (shaded 
blue) and catalytic motifs (right) are shown for each of the families. The 
consensus sequence logo of residues surrounding the cysteine and histidine 
positions of catalytic dyads from extant Tae enzymes are shown above 


alignments from each family. Below are aligned eukaroytic Dae proteins in 
these same regions. Representatives derived from distinct HGT events are 
separated by a space. Predicted N-terminal secretion signals (blue) and 
predicted catalytic residues (red) are coloured. Lowering the cut-off value in 
SignalP’* from the default value of 0.45 to the ‘sensitive’ value of 0.34 predicted a 
signal peptide in residues 1-21 of C. gigas Dae4. 
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Extended Data Figure 7 | Dae2 degrades mDAP- but not Lys-type PG. 

a, b, Partial HPLC chromatograms of sodium-borohydride-reduced 

soluble PG fragments (muropeptides) from Bacillus subtilis (a) or 
Streptococcus pneumoniae (b). PG sacculi products resulting from incubation 
with buffer (Control) or the indicated Dae2 proteins (wild type (WT) or 
C434A), followed by cellosyl digestion are shown. Major peaks are 

labelled. a, Muropeptides from B. subtilis include Tri (GlcNAc- 
MurNAc(reduced (r))-L-Ala-p-y-Glu-mDAP(amidated (NH))), Tetra 


(GlceNAc-MurNAc(r)-L-Ala-pD-y-Glu-mDAP(NH,)-p-Ala), and TetraTri 
(GleNAc-MurNAc-1-Ala—p-y-Glu-mDAP(NH,)-pb-Ala~-mDAP(NH),)-pD-y- 
Glu-L-Ala~-MurNAc(r)-GlcNAc). b, Muropeptides from S. pneumoniae 
include Tri (GlceNAc-MurNAc(r)-L-Ala—p-y-Gln-.L-Lys) and TetraTri 
(GleNAc-MurNAc-L-Ala—D-y-Gln-L-Lys—D-Ala-L-Lys—p-y-Gln-L-Ala— 
MurNAc(r)-GlcNAc). L-Ser-L-Ala branch is indicated by ‘(SA)’ and 
deacetylation by “(deAc)’. 
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Extended Data Figure 8 | Dae2 is active against B. burgdorferi PG. HPLC 
elution profiles of B. burgdorferi sacculi incubated with buffer (Control) or the 
indicated Dae2 proteins (wild type (WT) or C43A), followed by cellosyl 
digestion are shown. Discrete peaks lost (red) or produced (green) upon 
digestion by Dae2 are denoted with arrowheads in control and wild-type 
chromatograms, respectively. Unresolved peaks, probably corresponding to a 
complex mixture of multi-crosslinked species cleaved by Dae2, are also 
highlighted (blue line). B. burgdorferi PG composition is complex and not yet 
resolved, thus approximate elution times of uncrosslinked versus crosslinked 
species are based on E. coli muropeptides in the same solvent system. 
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Extended Data Figure 9 | Disruption of dae2 expression does not 
significantly alter tick physiology at repletion. a, Knockdown of dae2 does 
not increase the B. burgdorferi burden in infected nymphs at engorgement. 
Loads were quantified by qPCR analysis of flaB, a B. burgdorferi-specific gene, 
and normalized to TROSPA, a tick-specific gene. n = 20. For this and 
subsequent panels, each data point represents a pool of three nymphs, and 
horizontal bars represent mean values, which were not significantly different in 
a two-tailed nonparametric Mann-Whitney test (P > 0.5). b, Disruption of 
dae2 expression did not affect engorgement weights of nymphal ticks fed on 
B. burgdorferi-infected mice. Tick weights were measured at repletion and 2 
weeks post-repletion. Error bars show = s.d., n = 8. c, Overall bacterial load was 
not affected by knockdown of dae2. Bacterial load was assessed by qPCR 
analysis of the 16S rRNA normalized against the tick-specific gene TROSPA. 
Load is represented on both a linear (bottom) and log, (top) scale, which is 
denoted by a gap on the y-axis. 
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Extended Data Table 1 | Evolutionary analyses of dae and tae gene families 


Gene Species Number of Codons evolving 
family group species dN/dS under purifying selection 
; dae2 Ticks & mites 10 0.20 21% (25 of 120) 
Eukaryotic 
amidases dae4 Mollusks 9 0.18 40% (59 of 149) 
tae2 Cronobacter 10 0.12 23% (30 of 129) 
Prokaryotic — tae3.~—Acinetobacter 10 0.08 30% (45 of 150) 
amidases 
tae4 Pseudomonas 12 0.15 46% (75 of 163) 


Summary of results from maximum likelihood tests of aligned dae or tae sequences from the indicated species, using SLAC in the HyPhy software package®>. The overall gene dN/dS ratio (ratio of non-synonymous 
changes to synonymous changes) is shown, indicating an overall signature of purifying selection. Individual codons with a statistically significant signature of purifying selection (P < 0.05) were also calculated and 
are expressed as a percentage of the total number of codons used in the analysis. In the same analyses, no codons were found with a statistically significant signature of positive selection. 
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Exome sequencing identifies rare LDLR and APOA5 
alleles conferring risk for myocardial infarction 


A list of authors and their affiliations appears at the end of the paper 


Myocardial infarction (MI), a leading cause of death around the world, 
displays a complex pattern of inheritance’*. When MI occurs early 
in life, genetic inheritance is a major component to risk’. Previously, 
rare mutations in low-density lipoprotein (LDL) genes have been 
shown to contribute to MI risk in individual families**, whereas com- 
mon variants at more than 45 loci have been associated with MI risk 
in the population’. Here we evaluate how rare mutations con- 
tribute to early-onset MI risk in the population. We sequenced the 
protein-coding regions of 9,793 genomes from patients with MI at 
an early age (=50 years in males and =60 years in females) along with 
MlI-free controls. We identified two genes in which rare coding- 
sequence mutations were more frequent in MI cases versus controls 
at exome-wide significance. At low-density lipoprotein receptor (LDLR), 
carriers of rare non-synonymous mutations were at 4.2-fold increased 
risk for MI; carriers of null alleles at LDLR were at even higher risk 
(13-fold difference). Approximately 2% of early MI cases harbour a 
rare, damaging mutation in LDLR; this estimate is similar to one 
made more than 40 years ago using an analysis of total cholesterol’®. 
Among controls, about 1 in 217 carried an LDLR coding-sequence 
mutation and had plasma LDL cholesterol > 190 mg dl" *. At apo- 
lipoprotein A-V (APOAS), carriers of rare non-synonymous muta- 
tions were at 2.2-fold increased risk for MI. When compared with 
non-carriers, LDLR mutation carriers had higher plasma LDL choles- 
terol, whereas APOA5 mutation carriers had higher plasma triglyc- 
erides. Recent evidence has connected MI risk with coding-sequence 
mutations at two genes functionally related to APOAS5, namely lipo- 
protein lipase’*”’ and apolipoprotein C-III (refs 18, 19). Combined, 
these observations suggest that, as well as LDL cholesterol, disordered 
metabolism of triglyceride-rich lipoproteins contributes to MI risk. 

The US National Heart, Lung, and Blood Institute’s exome sequenc- 
ing project (ESP) sought to use exome sequencing as a tool to identify 
genes and mechanisms contributing to heart, lung and blood disorders. 
Within this program, we designed a discovery study for the extreme 
phenotype of early-onset MI (Fig. 1), as heritability is substantially greater 
when MI occurs early in life’*. From eleven studies, we identified 1,088 
cases with MI at an early age (MI in males =50 years old and in females 
=60 years old). As a comparison group, we selected 978 participants 
from prospective cohort studies who were of advanced age (males =60 
years old or females =70 years old) and free of MI. 

We sequenced cases and controls to high coverage by performing 
solution-based hybrid selection of exons followed by massively parallel 
sequencing (see Methods)’. We performed several quality control steps 
to identify and remove outlier samples and variants (see Methods and 
Supplementary Figs 1-13). Characteristics of the discovery set of 1,027 
cases and 946 controls are provided in Supplementary Tables 1-3. Across 
the autosomes, each participant had an average of 43 nonsense, 7,828 
missense, 92 splice-site, 189 insertion or deletion (indel) frameshift, 366 
indel non-frameshift, and 103 non-synonymous singleton variants. 

We first tested whether low-frequency coding variants (defined here 
as a single nucleotide variant (SNV) or indel with minor allele frequency 
(MAF) between 1% and 5%) are associated with risk for MI in the dis- 
covery sequencing study. We observed no significant association of MI 
status with any individual variant (Supplementary Fig. 14). We next 
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evaluated the hypothesis that rare alleles (defined here as a SNV or 
indel with MAF <1%) collectively within a gene contribute to risk for 
MI (see Methods). We tested for an excess (or deficit) in cases versus 
controls of rare, non-synonymous mutations by aggregating together 
SNVs and indels with MAF <1% (‘TI test) in each gene and comparing 
the counts in cases and controls*’. Empirical P values were obtained 
using permutation. 

The need to aggregate rare variants requires consideration of which 
variants to be studied together. Ideally, one would aggregate only harm- 
ful alleles and ignore benign alleles. To enrich for harmful alleles, we 
considered three sets of variants: (1) non-synonymous only; (2) a ‘dele- 
terious (PolyPhen)’ set consisting of non-synonymous after excluding 
missense alleles annotated as benign by PolyPhen-2 HumDiv software; 
and (3) ‘disruptive’ mutations only (nonsense, indel frameshift, splice- 
site; also referred to as ‘null’ mutations). To account for multiple test- 
ing, we set exome-wide significance for this study at P=8 X10 7,a 
Bonferroni correction for the testing of ~20,000 genes and three variant 
sets. When the T1 test was applied across these three sets of alleles in the 
discovery sequencing study, no gene-based association signal deviated 
from what we expected by chance (Supplementary Figs 15-22). 
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cases controls 


Age 
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Figure 1 | Overall design for the early-onset myocardial infarction study 
within the US National Heart, Lung, and Blood Institute’s exome 
sequencing project (ESP). Whole exome sequencing was performed in 
1,973 individuals from the phenotypic extremes. To test the hypothesis that 
low-frequency variants confer risk for myocardial infarction (MI), we 
performed follow-up statistical imputation and array-based genotyping of 
single nucleotide variants. To test the hypothesis that a burden of rare 
mutations in a gene confers risk for MI, we performed targeted re-sequencing 
and additional exome sequencing. 
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Table 1 | Association of a burden of rare mutations in APOA5 with risk for early-onset myocardial infarction or coronary artery disease 


Mutation set ncases/controls Tl cases T1 controls Freq cases (%) Freq control (%) OR P 
Non-synonymous 6,721/6,711 93 42 14 0.63 2.2 5x107 
Deleterious (PolyPhen) 6,721/6,711 63 31 0.94 0.46 2.0 6x10° 
Deleterious (broad) 6,721/6,711 68 31 1.0 0.46 2:2 2x10% 
Deleterious (strict) 6,721/6,711 10 3 0.15 0.045 3.3 0.008 
Disruptive 6,721/6,711 9 2 0.13 0.03 4.5 0.007 
Summary allele counts and carrier frequencies are shown. Only SNVs and indels with minor allele frequency less than 1% were considered in burden analysis. Deleterious (PolyPhen) as defined by nonsense, 
splice-site, indel frameshift, and missense annotated as ‘possibly damaging’ or ‘probably damaging’ by PolyPhen-2 HumDiv software; ‘deleterious (broad)’ as defined by nonsense, splice-site, indel frameshift, 
and missense annotated as deleterious by at least one of the five protein prediction algorithms of LRT score, MutationTaster, PolyPhen-2 HumDiv, PolyPhen-2 HumVar and SIFT; ‘deleterious (strict)’ as defined by 


nonsense, splice-site, indel frameshift, and missense annotated as deleterious by all five protein prediction algorithms; Disruptive defined as nonsense, splice-site or indel frameshift; T1: alleles from SNVs or indels 
with minor allele frequency less than 1%; Freq (%): percentage of cases or controls carrying a T1 allele; OR: odds ratio. 


We followed up on discovery sequencing results in four ways: (1) 
statistical imputation; (2) array-based genotyping using the Illumina 
HumanExome Beadchip (‘Exome’ chip); (3) targeted re-sequencing; and 
(4) additional exome sequencing (Fig. 1). Imputation and array-based 
genotyping were used to mainly evaluate low-frequency variants, whereas 
targeted re-sequencing and exome sequencing were used to test the 
role of rare mutations. 

With the first and second follow-up approaches: imputation (1 = 
64,132) and array-based genotyping (n = 15,936), respectively, we did 
not identify novel low-frequency variants associated with MI or coronary 
artery disease (CAD) (see Methods, Supplementary Tables 4-7 and 
Supplementary Figs 23-27). The top association results for SNVs from 
array-based genotyping are shown in Supplementary Table 8. 

In the third follow-up approach, we re-sequenced several genes in 
additional cases and controls (see Methods, Supplementary Table 9). 
After sequencing the exons of APOAS in 6,721 cases and 6,711 controls, 
we identified 46 unique non-synonymous or splice-site SNVs or indel 
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Figure 2 | Apolipoprotein A-V (APOAS5) mutations discovered after 
sequencing of 13,432 individuals. Individual mutations (non-synonymous, 
indel frameshift and splice-site variants with minor allele frequency less than 
1%) are depicted according to the genomic position along the length of the 
APOAS gene starting at the 5’ end (top). The number of circles on the left and 
right represents the number of times that mutation is observed in cases or 
controls, respectively. Dashed lines across the gene connect the same mutation 
seen in both cases and controls. Mutations are shaded in red (observed in 
cases only), blue (observed in controls only) or yellow (observed in both cases 
and controls). 


frameshifts with allele frequency < 1% (Supplementary Table 10). Based 
on these variants, we observed 93 alleles in cases and 42 alleles in con- 
trols (P= 5 X 10 ’;Table 1, Fig. 2 and Supplementary Table 10). This 
burden of rare mutation signal was primarily driven by mutations seen 
in one or two study participants (Fig. 2 and Supplementary Table 10). 
Carriers of a rare APOA5 mutation had a 2.2-fold higher risk for MI/ 
CAD than non-carriers (Table 1). 

According to a recent report, consideration of variant sets based on 
multiple protein prediction algorithms might yield stronger association 
signals”. Therefore, we investigated two additional variant sets: (1) “dele- 
terious (broad)’ as defined by nonsense, splice-site, indel frameshift, 
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Figure 3 | Low-density lipoprotein receptor (LDLR) mutations discovered 
after sequencing 9,793 individuals. a, Individual disruptive mutations 
(nonsense, indel frameshift, and splice-site variants with minor allele frequency 
less than 1%) are depicted according to the genomic position along the length of 
the LDLR gene starting at the 5’ end (top). The number of circles on the left 
and right represents the number of times that mutation is observed in cases or 
controls, respectively. Mutations are shaded in red if observed in cases only 
or blue if observed in controls only. b, LDL cholesterol level as observed in 
different LDLR gene mutation annotation categories. Mean (height of bar) and 
95% confidence intervals (error bars) are shown. Each individual is categorized 
based on mutation annotation as follows. Non-carriers: carriers without a 
missense or disruptive mutation; deleterious (PolyPhen) as defined by 
nonsense, splice-site, indel frameshift, and missense annotated as ‘possibly 
damaging’ or ‘probably damaging’ by PolyPhen-2 HumDiv software; 
‘deleterious (broad)’ as defined by nonsense, splice-site, indel frameshift, and 
missense annotated as deleterious by at least one of five protein prediction 
algorithms (LRT score, MutationTaster, PolyPhen-2 HumDiv, PolyPhen-2 
HumVar and SIFT); ‘deleterious (strict)’ as defined by nonsense, splice-site, 
indel frameshift, and missense annotated as deleterious by all five of the above 
protein prediction algorithms; disruptive: carriers of mutations that are 
nonsense, indel frameshift, or splice-site. 
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Table 2 | Association of a burden of rare mutations in LDLR with risk for early-onset myocardial infarction or coronary artery disease 


Mutation set ncases/controls Tl cases T1 controls Freq cases (%) Freq controls (%) OR P 

Non-synonymous 4,703/5,090 285 208 6.1 41 1.5. 4x10° 
Deleterious (PolyPhen) 4,703/5,090 148 67 3.1 1.3 24 1x10713 
Deleterious (broad) 4,703/5,090 243 158 5.2 3.1 1.7 9x108 
Deleterious (strict) 4,703/5,090 90 23 1.9 0.45 42 310") 
Disruptive 4,703/5,090 24 2 0.51 0.039 13.0 9x10° 


Summary allele counts and carrier frequencies are shown. Only SNVs and indels with minor allele frequency less than 1% were considered in burden analysis. Deleterious (PolyPhen) as defined by nonsense, 
splice-site, indel frameshift, and missense annotated as ‘possibly damaging’ or ‘probably damaging’ by PolyPhen-2 HumDiv software; ‘deleterious (broad)’ as defined by nonsense, splice-site, indel frameshift, 
and missense annotated as deleterious by at least one of the five protein prediction algorithms of LRT score, MutationTaster, PolyPhen-2 HumDiv, PolyPhen-2 HumVar and SIFT; ‘deleterious (strict)’ as defined by 
nonsense, splice-site, indel frameshift, and missense annotated as deleterious by all five protein prediction algorithms; Disruptive defined as nonsense, splice-site or indel frameshift; T1: alleles from SNVs or indels 
with minor allele frequency less than 1%; Freq (%): percentage of cases or controls carrying a T1 allele; OR: odds ratio. 


and missense annotated as damaging by at least one of five protein 
prediction algorithms; and (2) ‘deleterious (strict)’ as defined by non- 
sense, splice-site, indel frameshift, and missense annotated as damaging 
by all five protein prediction algorithms (see Methods). Carriers of a 
rare APOAS deleterious (strict) mutation had an even higher risk for 
MI/CAD (3.3-fold, P = 0.008). 

A burden of rare mutations in APOAS explains about 0.14% of the 
total variance for MI and roughly 0.28% of the heritability (assuming 
that additive genetic factors explain ~50% of the overall variance) (see 
Methods and Supplementary Table 11). When compared with non- 
carriers, carriers of rare non-synonymous APOAS alleles had higher plasma 
triglycerides (median in carriers was 167 mg dl‘ versus 104 mg dl 
for non-carriers, P = 0.007) and lower high-density lipoprotein cho- 
lesterol (mean in carriers was 43 mg dl’ versus 57 mg dl‘ for non- 
carriers, P = 0.007), but similar LDL cholesterol (median in carriers 
was 110 mgdl ' versus 108 mg dl * for non-carriers, P = 0.66) (Sup- 
plementary Table 12). 

In the fourth follow-up approach, we performed exome sequencing 
in additional early-onset MI/CAD cases and controls, bringing the 
total number of exomes analysed to 9,793 (Supplementary Tables 13 
and 14). We tested for an excess (or deficit) in cases versus controls of 
rare mutations in any gene (Supplementary Fig. 28 and Supplementary 
Tables 15-17). At this sample size, rare alleles collectively conferred 
risk for MI at exome-wide significance in only one gene, LDLR (Fig. 3). 

After sequencing the exons of LDLR in 4,703 cases and 5,090 con- 
trols, we identified 156 unique non-synonymous, splice-site SNVs and 
indel frameshifts with allele frequency <1% (Table 2 and Supplemen- 
tary Table 18). Of these variants, we observed 285 alleles in cases (6.1% 
of cases) and 208 alleles in controls (4.1% of controls) (1.5-fold effect 
size, P= 4X 10 °) (Table 2). When restricting analysis to the delete- 
rious (PolyPhen) set, 3.1% of cases and 1.3% of controls carried at least 
one such rare mutation, for a 2.4-fold effect size (P=1X10''). A 
higher effect size of 4.2-fold (P = 3 X 10~ "!) was observed when restrict- 
ing to the deleterious (strict) set. When restricting to disruptive alleles, 
0.51% of cases and 0.04% of controls carried at least one such rare 
disruptive mutation, for a 13-fold effect size (P= 9 X 10°) (Table 2 
and Fig. 3). 

Among controls, approximately 1 in 217 individuals carried an LDLR 
non-synonymous or disruptive mutation and had LDL cholesterol 
> 190 mg dl}; in contrast, among cases, approximately 1 in 51 indi- 
viduals carried an LDLR non-synonymous or disruptive mutation and 
had LDL cholesterol > 190 mg dl’. 

A burden of rare mutations in LDLR explains about 0.24% of the 
total variance for MI and roughly 0.48% of the heritability (see Methods 
and Supplementary Table 19). LDL cholesterol level differed based on 
functional class annotation with the greatest difference seen between 
carriers of disruptive mutations and those who did not carry any non- 
synonymous mutations (279 mg dl ' versus 135 mg dl ', Fig. 3 and 
Supplementary Table 20). Approximately 49% of the LDLR alleles dis- 
covered in this study (77 of 156) have been previously observed in LDLR 
familial hypercholesterolemia databases” (Supplementary Table 21). 

Using these rare variant signals as a guide, we estimated sample sizes 
that will be required to make similar discoveries. A very large number of 
samples, at least 10,000 exomes, are required to achieve 80% statistical 
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power at an exome-wide level of statistical significance (Supplementary 
Figs 29-31). 

Here we show that a burden of rare alleles in two genes, LDLR and 
APOAS, contributes to risk for MI. These results suggest several conclu- 
sions regarding the inherited basis for MI and rare variant association 
studies. First, after a DNA sequence-based search across nearly all protein- 
coding genes in >9,700 early-onset MI cases and controls, LDLR is the 
strongest association signal, with mutations in the gene accounting for 
about 2% of cases. In 1973, Goldstein and colleagues studied survivors 
of early MI and noted two common lipid abnormalities: hypercholes- 
terolemia and hypertriglyceridemia’®. On the basis ofa total cholesterol 
value exceeding ~285 mg dl ', it was estimated that 4.1% of cases with 
MI prior to the age of 60 had familial hypercholesterolemia; this ori- 
ginal estimate is similar to ours based on direct sequencing. In contrast, 
the prevalence of harmful LDLR mutations in the general population is 
higher than the original estimate (~0.5 in the present study versus 0.1- 
0.2% by Goldstein). Second, the rare variant association signal presented 
here establishes APOAS as a bona fide MI gene. Initially discovered 
through comparative genomics analysis of a region harbouring several 
lipid regulators (that is, APOA1 and APOC3), the APOAS locus har- 
bours common variants associated with plasma triglycerides”*. Candi- 
date gene and genome-wide association studies have associated common 
variants at this locus also with MI risk (that is, —1131T>C, APOA5 
promoter region, rs662799, MAF of 8%)*°*°. However, because of ex- 
tensive linkage disequilibrium in this region, it had been previously 
uncertain which gene is responsible for the association with MI. The 
identification of multiple coding sequence variants within APOAS clar- 
ifies that this gene contributes to MI risk in the population. Third, these 
data point to a route to MI beyond LDL cholesterol, namely triglyceride- 
rich lipoproteins” and the lipoprotein lipase pathway. Genetic variation 
at two other proteins related to APOAS function, apolipoprotein C-III 
(refs 18, 19, 28) and lipoprotein lipase’*”’, has been associated with tri- 
glycerides and MI risk. Finally, the present study makes clear that rare 
variant discovery for complex disease will require the sequencing of 
thousands of cases and careful statistical analysis. Two reasons for the 
large sample size requirement are an inability to readily distinguish 
harmful from benign alleles and the extreme rarity of harmful alleles. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


General overview of the Exome Sequencing Project (ESP). Details of the study 
design of the National Heart, Lung and Blood Institute’s GO exome sequencing 
project (NHLBI ESP) have been published previously”’. Briefly, the goal of the 
NHLBI ESP was to discover rare coding variation in genes contributing to heart, 
lung and blood disorders using next-generation sequencing of the protein-coding 
regions of the genome (exome sequencing). The study includes five primary groups 
including: Seattle GO (University of Washington, Seattle, Washington); Broad GO 
(Broad Institute, Cambridge, Massachusetts); WHISP (Ohio State University 
Medical Center, Columbus, Ohio); Lung GO (University of Washington, Seattle, 
Washington); Heart GO (University of Virginia Health System, Charlottesville, 
Virginia) and two collaborating groups, WashU GO (Washington University, St 
Louis) and CHARGE-S GO (University of Texas Health Sciences Center, Houston, 
Texas). 

We included samples from several studies: Women’s Health Initiative (WHI); 
Framingham Heart Study (FHS); Jackson Heart Study (JHS); Multi-Ethnic Study 
of Atherosclerosis (MESA); Atherosclerosis Risk in Communities (ARIC); Coro- 
nary Artery Risk development in Adults (CARDIA); Cardiovascular Health Study 
(CHS); Lung Health Study (LHS); COPD genetic epidemiology (COPD Gene); 
severe asthma research project (SARP); pulmonary arterial hypertension (PAH); 
acute lung injury (ALI); cystic fibrosis (CF); Cleveland Clinic GeneBank (CCGB); 
Massachusetts General Hospital premature coronary artery disease study (MGH 
PCAD); Heart Attack Risk in Puget Sound (HARPS); Translational Research In- 
vestigating Underlying Disparities in Acute Myocardial Infarction Patients’ Health 
Status (TRIUMPH) and the PennCath study. 

General overview of the ESP early-onset myocardial infarction study. Within 
the NHLBI ESP, we designed an exome sequencing experiment specifically to study 
early-onset myocardial infarction (EOMI). We selected EOMI cases and controls 
from eleven studies, including: ARIC, MESA, CCGB, FHS, HARPS, MGH PCAD, 
PennCath, TRIUMPH, WHI, CHS, and JHS (Supplementary Tables 1-3). Samples 
were selected based on the extreme tails of the phenotypic distribution, in order to 
enrich for a genetic contribution to disease. EOMI cases were defined as individuals 
who had an MI at an age of =50 for men and <60 for women. Controls were selected 
as individuals with no history of MI at baseline or during follow-up to at least age 60 
for men and 70 for women. The study samples, along with case and control defini- 
tions, are briefly described below and shown in Supplementary Tables 1-3. 
Study and phenotype descriptions for ESP EOMI 

The HeartGO consortium. HeartGO is a multiethnic consortium consisting of six 
NHLBI population-based cohorts of men and women: ARIC, CHS, FHS, CARDIA, 
JHS, and MESA. The age range of participants in these six cohorts spans the spec- 
trum from early adulthood to old age, providing a broad age representation. Each 
participating cohort in HeartGO has completed ascertainment of multiple pheno- 
types, including all of the major cardiovascular risk factors (blood pressure, lipids, 
diabetes status), biomarkers including measures of blood cell counts, subclinical 
disease imaging, and cardiovascular and lung outcomes including MI and stroke. 
Participants in all six cohorts provided written informed consent. The NIH data- 
base of genotypes and phenotypes (dbGaP) site contains further details regarding 
the phenotypes accessible for each individual HeartGO cohort. 

Cleveland Clinic GeneBank (CCGB). The CCGB study is a single-centre prospec- 
tive cohort-based study that enrolled patients undergoing elective diagnostic cor- 
onary angiography between 2001 and 2006. 

Heart Attack Risk in Puget Sound (HARPS). The HARPS study is a population- 
based case-control study that enrolled cases with incident MI presenting to a net- 
work of hospitals in the metropolitan Seattle-Puget Sound region of Washington 
State between 1998 and 2002. 

The Massachusetts General Hospital premature coronary artery disease (MGH 
PCAD) study. The MGH PCAD study is a hospital-based case-control study that 
enrolled cases hospitalized with early MI at MGH between 1999 and 2004. 
PennCath. The PennCath study is a catheterization-lab based cohort study from 
the University of Pennsylvania Medical Center and enrolled subjects at the time of 
cardiac catheterization and coronary angiography between 1998 and 2003. Persons 
undergoing cardiac catheterization at either the Hospital of the University of 
Pennsylvania or Penn Presbyterian Medical Center consented for the PennCath 
study to identify genetic and biochemical factors related to coronary disease. 
The Translational Research Investigating Underlying Disparities in Acute Myo- 
cardial Infarction Patients’ Health Status (TRIUMPH). The TRIUMPH study is 
a large, prospective, observational cohort study of consecutive patients with acute 
MI presenting to 24 US hospitals from April 2005 to December 2008. MI was 
diagnosed using contemporary definitions” and all patients had an elevated tro- 
ponin blood test. 

Women’s Health Initiative (WHI). WHI is a major research program that has been 
ongoing for over 20 years to address the most common causes of death, disability 
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and poor quality of life in postmenopausal women—cardiovascular disease, cancer, 
and osteoporosis. 

Studies involved in follow-up statistical imputation, array-based genotyping, 
targeted re-sequencing and additional exome sequencing 

Statistical imputation. We performed statistical imputation of single nucleotide 
variants (SNVs) discovered in the exomes of the first 786 samples. We imputed exonic 
SNVs into 64,132 independent samples in 16 studies to test for association of coding 
SNVs with MI or CAD. The studies are described in Supplementary Table 5. 
Array-based genotyping. We performed follow-up array-based genotyping using 
the Illumina HumanExome Beadchip (‘exome chip’) array in 15,936 independent 
samples from seven studies. The studies are described in Supplementary Table 7. 
Targeted re-sequencing. We performed targeted re-sequencing of the APOA5 gene 
in an additional 11,414 individuals from five cohorts. The studies are described in 
Supplementary Table 9. 

Exome sequencing-based follow-up. We performed exome sequencing in addi- 
tional individuals from three cohorts. The studies are described in Supplementary 
Table 13. 

Detailed methods for the processing and analysis of samples for the various 
stages of the project are described below. We describe methods for the different stages 
of the project, including discovery exome sequencing, follow-up imputation, array- 
based genotyping, targeted re-sequencing and additional exome sequencing. 
Laboratory methods for discovery exome sequencing in the ESP EOMI Project. 
Exome sequencing. Exome sequencing was performed at the Broad Institute. Sequenc- 
ing and exome capture methods have been previously described”. A brief descrip- 
tion of the methods is provided below. 

Receipt/quality control of sample DNA. Samples were shipped to the Biological 
Samples Platform laboratory at the Broad Institute of MIT and Harvard. DNA con- 
centration was determined by the Picogreen assay (Invitrogen) before storage in 
2D-arcoded 0.75 ml Matrix tubes at —20 °C in the SmaRTStore (RTS, Manchester, 
UK) automated sample handling system. We performed initial quality control (QC) 
on all samples involving sample quantification (PicoGreen), confirmation of high- 
molecular weight DNA and fingerprint genotyping and gender determination 
(Illumina iSelect). Samples were excluded if the total mass, concentration, integ- 
rity of DNA or quality of preliminary genotyping data was too low. 

Library construction and in-solution hybrid selection. Starting with 3 1g of geno- 
mic DNA, library construction and in-solution hybrid selection were performed 
as described previously*’. A subset of samples, however, was prepared using this 
protocol with some slight modifications. Initial genomic DNA input into shearing 
was reduced from 3 jig to 100 ng in 50 ul of solution. In addition, for adaptor liga- 
tion, Illumina paired-end adapters were replaced with palindromic forked adap- 
ters with unique 8 base index sequences embedded within the adaptor. 
Preparation of libraries for cluster amplification and sequencing. After in- 
solution hybrid selection, libraries were quantified using qPCR (KAPA Biosystems) 
with probes specific to the ends of the adapters. This assay was automated using 
Agilent’s Bravo liquid handling platform. Based on qPCR quantification, libraries 
were normalized to 2nM and then denatured using 0.1 N NaOH using Perkin- 
Elmer’s MultiProbe liquid handling platform. A subset of the samples prepared 
using forked, indexed adapters was quantified using qPCR, normalized to 2nM 
using Perkin-Elmer’s Mini-Janus liquid handling platform, and pooled by equal 
volume using the Agilent Bravo. Pools were then denatured using 0.1 N NaOH. 
Denatured samples were diluted into strip tubes using the Perkin-Elmer MultiProbe. 
Cluster amplification and sequencing. Cluster amplification of denatured tem- 
plates was performed according to the manufacturer’s protocol (Illumina) using either 
Genome Analyzer v3, Genome Analyzer v4, or HiSeq 2,000 v2 cluster chemistry 
and flowcells. After cluster amplification, SYBR green dye was added to all flowcell 
lanes, and a portion of each lane visualized using a light microscope, in order to 
confirm target cluster density. Flowcells were sequenced either on Genome Analyzer 
II using v3 and v4 Sequencing-by-Synthesis Kits, then analysed using RTA v1.7.48, 
or on HiSeq 2,000 using HiSeq 2,000 v2 Sequencing-by-Synthesis Kits, then ana- 
lysed using RTA v1.10.15. All samples were run on 76 cycle, paired end runs. For 
samples prepared using forked, indexed adapters, Illumina’s Multiplexing Sequenc- 
ing Primer Kit was also used. 

Read mapping and variant analysis. Samples were processed from real-time base- 
calls (RTA 1.7 software [Bustard], converted to qseq.txt files, and aligned to a human 
reference (hg19) using Burrows—Wheeler Aligner (BWA, see ref. 32). Aligned reads 
duplicating the start position of another read were flagged as duplicates and not 
analysed (‘duplicate removal’). Data was processed using the Genome Analysis 
ToolKit (GATK v1.1.3, ref. 33). Reads were locally realigned (GATK IndelRealigner) 
and their base qualities were recalibrated (GATK TableRecalibration). Variant 
detection and genotyping were performed on both exomes and flanking 50 base 
pairs of intronic sequence using the UnifiedGenotyper (UG) tool from the GATK. 
Variant data for each sample was formatted (variant call format (VCF)) as ‘raw’ 
calls for all samples. SNVs and indel sites were flagged using the variant filtration 
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walker (GATK) to mark sites of low quality that were likely false positives. SNVs 
were marked as potential errors if they exhibited strong strand bias (SB = 0.10), 
low average quality (quality per depth of coverage (QD) < 5.0), or fell in a homo- 
polymer run (HRun > 4). Indels were marked as potential errors for low quality 
(quality score (QUAL) < 30.0), low average quality (QD <2.0), or ifthe site exhib- 
ited strong strand bias (SB > —1.0). Samples were considered complete when 
exome targeted read coverage was = 20X over = 80% of the exome target. 

Data analysis QC. Fingerprint concordance between sequence data and finger- 
print genotypes was evaluated. Variant calls were evaluated on both bulk and per- 
sample properties: novel and known variant counts, transition—transversion (TS-TV) 
ratio, heterozygous—homozygous non-reference ratio, and deletion/insertion ratio. 
Both bulk and sample metrics were compared to historical values for exome sequenc- 
ing projects at the Broad Institute. No significant deviation of the ESP variants or 
ESP samples from historical values was noted. 

Data processing, quality control and association analysis of discovery exome 
sequencing 

Variant calling. Variants (SNVs and indels) were identified and genotyped from 
recalibrated BAM files** using the multi-sample processing mode of the Unified 
Genotyper tool from the GATK. Variants were first identified and genotyped in 
random batches of 100 samples. The batches were then merged into a single VCF 
file using the GATK CombineVariants tool. 

Variant annotation. Variants (SNVs and indels) were annotated using the GRCh37.64 
database using the SNP effect predictor tool (SnpEff, see ref. 35) and the GATK 
VariantAnnotator. The primary SnpEff genomic effects that were annotated include: 
splice-site acceptor, splice-site donor, indel frameshift, indel non-frameshift, non- 
sense, non-synonymous and synonymous variants. For variants that have different 
annotations due to multiple transcripts of the gene, the highest impact effect for 
each variant was taken. 

Sample level quality control. We performed several quality control steps to iden- 
tify and remove outlier samples (Supplementary Figs 1-8). First, we required that 
each sample had a minimum of 20-fold coverage for at least 80% of the targeted bases. 
Second, we compared self-reported ancestry with that inferred from the sequence 
data and removed discordant samples. Third, we removed samples with high degree 
of heterozygosity and low number of singleton counts as this pattern suggests DNA 
contamination across samples. Fourth, we removed samples with an extremely 
high number of variants or singletons as this can suggest low quality DNA. Finally, 
we removed samples exhibiting a mismatch between the reported gender and that 
inferred from sequence data. Of 2,066 cases and controls sequenced across the exome, 
we removed 93 samples due to these exclusion criteria. 

Variant level quality control. QC measures were also performed to remove low 
quality variants. We assessed population genetics metrics including the TS-TV ratio, 
the ratio of the number of heterozygous changes to the number of homozygous non- 
reference changes, and the number of non-synonymous to the number of synonym- 
ous changes. This analysis can help filter false positive calls since we expect the true 
TS-TV to be around ~3.2 in European populations”, while a set of random SNVs 
(or false positive variants) should give a random expectation of 0.5. Variants with 
low depth of coverage (DP) and high percent missingness generally had low TS- 
TV and heterozygous-homozygous non-reference ratios. Variants were removed 
if there was DP <8 average per sample and >2% missingness (Supplementary Figs 
9-12). Distribution of allele frequencies of the SNVs is shown in Supplementary 
Fig. 13. 

Common variant association analysis. We performed single variant association 
analysis in our exome sequencing data set. For SNVs with MAF greater than 5%, 
we ran logistic regression, after adjusting for 10 principal components while for 
SNVs with MAF less than 5%, we ran Fisher’s Exact test. We performed association 
analysis in European Americans and African Americans separately and then per- 
formed sample size weighted meta-analysis using METAL”. The association results 
are shown in Supplementary Fig. 14. 

Rare variant association analysis. To test whether rare mutations contribute to 
ML, we performed burden of rare variant analysis on the ~2,000 ESP EOMI exome 
samples. We performed a variant of the Combined Multivariate Collapsing test”', 
that groups the count of alleles of SNVs in cases and controls. Phenotype labels 
were permuted 100,000 times to assign a statistical significance. We accounted for 
ethnicity by permuting phenotype labels within each ethnicity. Association ana- 
lysis was performed using PLINK/SEQ. 

We collapsed variants based on computational predictions from PolyPhen-2 
HumDiv*’. Minor allele frequencies were calculated from all available samples 
sequenced in each study in order to obtain the most accurate MAF estimates. 
Therefore, calculation of MAF for ESP EOMI 1 and 2 was performed on a larger 
set of exome samples that were sequenced at the Broad Institute as part of ESP 
(n = 970 exomes for ESP EOMI 1 and n = 3,014 for ESP EOMI 2). For our burden 
of rare variant association analysis, we use a MAF threshold of 1% (T1). Further- 
more, we use three different types of variant groupings when collapsing by gene. 


These variant groups are: (1) non-synonymous only; (2) a deleterious set consisting of 
non-synonymous after excluding missense alleles annotated as benign by PolyPhen-2 
HumDiv software; and (3) disruptive (nonsense, indel frameshift, splice-site) muta- 
tions only. We also performed the T1 test after collapsing all non-synonymous 
mutations by KEGG pathways (Supplementary Figs 21 and 22). 

Methods for follow-up statistical imputation 

Construction of reference panels and targeted imputation panels. Exome impu- 
tations were performed using two reference panels and 16 targeted imputation panels. 
A total of 697 ESP samples (436 African Americans and 261 European Americans) 
were used for the first reference panel while 89 samples from the 1000 Genomes 
Project** were drawn for the second reference panel. For the ESP reference panel, 
all samples from ARIC (n = 212), JHS (m = 119), MGH PCAD or HARPS (n = 151) 
and WHI studies (n = 41) were genotyped using commercially available Affymetrix 
6.0 arrays. Samples from the FHS (” = 174) were genotyped using the Affymetrix 
5.0 array. The second reference panel was comprised of samples from the 1000 
Genomes Project that had genotype data for both low coverage sequencing and 
high coverage exome sequencing data™. A total of 89 samples were selected from 6 
diverse populations (23 African Ancestry in Southwest US (ASW), 9 Utah residents 
with Northern and Western European ancestry (CEU), 12 Colombian in Medellin, 
Colombia (CLM), 25 Mexican Ancestry in Los Angeles, CA (MXL), 17 Toscani in 
Italia (TSI) and 3 Yoruba in Ibadan, Nigeria (YRI) samples). Low-coverage whole 
genome sequencing, high-coverage exome sequencing and targeted exome capture 
were performed based on standard protocols at the Broad Institute. Details of the 
sequencing methods and samples have been described previously**. Imputation was 
performed into 16 independent study samples with genome-wide genotype data. 
Study samples were genotyped using commercially available Affymetrix or Illumina 
genotyping arrays. Further details are described in Supplementary Table 5. 

Reference panels were created by merging genotypes from SNVs that span the 
entire genome (hence, providing a haplotype ‘scaffold’), with genotypes from SNVs 
from ESP exome sequencing data. The first reference panel was generated using 
genotypes from both genome-wide SNV arrays obtained from dbGAP and exome 
sequencing data. The second reference panel was generated using genotype data 
for both low coverage sequencing and high coverage exome sequencing data. Both 
the reference panel and targeted genome-wide panel were phased using the ‘best 
guess haplotypes’ option in IMPUTE2 (ref. 39). Haplotype phasing were performed 
in 5 megabase chunks as recommended by the software tutorial”. 

Data processing, quality control and association analysis. Imputation of the 
exome was performed using IMPUTE2. We imputed approximately 400,000 cod- 
ing SNVs from the reference panels into 28,068 cases and 36,064 controls from 16 
different study samples with genome-wide data. Descriptions for the study samples 
have been reported elsewhere (Supplementary Table 5 for references). We filtered 
SNVs with MAF < 1% and imputation quality (INFO) <0.5 from further analysis. 
The distribution of imputation qualities of the SNVs is shown in Supplementary 
Figs 23 and 24. Association testing for CAD/MI was performed using the score 
method and assuming an additive model in SNPTEST”. Age, sex and the first two 
principal components were used as covariates when appropriate. We did not observe 
any indication of excess inflation of test statistics in any of the study samples 
(Supplementary Table 22). Meta-analysis of study-specific P values for imputed 
SNVs was performed using the Z-score method weighted by sample size in METAL. 
Beta and standard errors were estimated based on an inverse-weighted meta-analysis. 
The distribution of association results for the imputation results is shown in Sup- 
plementary Fig. 25 and top association results in Supplementary Table 6. 
Methods for follow-up array-based genotyping 

Laboratory methods. DNA samples were sent to the Broad Institute Genetic Ana- 
lysis Platform for genotyping and were placed on 96-well plates for processing using 
the Illumina HumanExome v1.0 SNP array. Genotypes were assigned using 
GenomeStudio v2010.3 using the calling algorithm/genotyping module version 
1.8.4 along with the custom cluster file StanCtrExChp_CEPH.egt. Only samples 
passing an overall call rate of 98% criteria and standard identity check were released 
from the genetic analysis platform. 

Data processing, quality control and association analysis. To identify single low- 
frequency SNVs associated with MI or CAD, we performed array-based genotyping 
using the Illumina Human Exome Beadchip. We genotyped 83,680 sites identified 
from exome sequencing in 1,027 early-onset MI cases and 946 controls. The sam- 
ples for genotyping were drawn from the cohorts listed in Supplementary Table 7 
and have been previously described. The functional effect of each variant was 
predicted using the SeattleSeq Annotation server. For variants having more than 
one functional class, the most deleterious class was retained. 

Several quality control processes were employed to ensure high quality geno- 
types and samples were used in the association analysis. Samples were excluded for 
the following criteria: greater than 5% missing genotypes; discordance between 
inferred gender based on genotype and self-reported gender; inbreeding coefficient 
less than —0.2 or greater than 0.2; duplicated samples; or proportion of genotypes 


©2015 Macmillan Publishers Limited. All rights reserved 


identical by descent >0.2. In addition, principal components were calculated using 
Eigenstrat 4.2 (ref. 41) and samples were removed if they were found to be statistical 
population outliers. Variants were removed for the following criteria: MAF = 0%; 
significant difference between missingness in cases compared with controls; extreme 
deviation from Hardy-Weinberg equilibrium (P < 1 X 10°); or significant asso- 
ciation with genotyping plate assignment. All quality control filtering were per- 
formed using PLINK” and R (The R Project for Statistical Computing, Vienna, 
Austria). 

Association testing for CAD/MI was performed within each study separately 

using logistic regression with ten principal components of ancestry as covariates. An 
inverse standard-error weighted meta-analysis was performed to combine results 
across studies. The association testing was performed using PLINK” and the meta- 
analysis was performed using METAL. There was no indication of an inflation of 
test statistics across studies (Supplementary Table 23). The stability of logistic regres- 
sion was assessed by examining the standard error of the beta estimate as a function 
of minor allele frequency (see Supplementary Fig. 32). As shown, logistic regression 
is unstable for a MAF < 0.05%. Fisher’s Exact test was used for variants with MAF 
< 0.05%. The top association results are shown in Supplementary Table 8. 
Methods for follow-up re-sequencing 
Selection of genes. We first selected six associated genes (based on biologic and/or 
statistical evidence with T1 P < 0.005; APOA5, CHRM5, SMG7, LYRM1, APOC3, 
NBEALI) for replication sequencing in the ATVB study (Supplementary Table 24) 
where all cases had suffered an MI before age of 46. We also pursued the same six 
genes in the Ottawa Heart Study with 552 cases and 586 controls (Supplementary 
Table 25). One of the genes (APOAS) continued to show significant results and 
was sequenced in three additional studies (Table 1 and Supplementary Table 26). 
In total, we performed follow-up sequencing of APOAS in six study samples, includ- 
ing the Verona heart study (VHS), Ottawa heart study (OHS), additional exomes 
from atherosclerosis, thrombosis, and vascular biology Italian study group (ATVB), 
additional exomes from the ESP EOMI study (ESP EOMI 2), Precocious Coronary 
Artery Disease Study (PROCARDIS), and the Copenhagen City Heart Study and 
Copenhagen Ischaemic Heart Disease Study (CCHS/CIHDS). 
Laboratory methods. For the VHS study, genomic DNA was extracted from white 
blood cells using the salting-out method. The protein-coding regions correspond- 
ing to the RefSeq transcripts NM_052968 for APOA5 and NM_012125 for CHRM5 
were sequenced using in-house designed primers (available on request) and the 
BigDye Terminator Cycle Sequencing Kit v1.1 onan ABI-3130XL Genetic Analyzer 
(Applied Biosystems, Foster City, CA). SNVs were called using the Variant Reporter 
software v1.1 (Applied Biosystems). 

For the OHS study, PCR primers were designed, tested and optimized to target 
the exons and flanking non-coding sequences for each gene. Sequencing reactions 
were performed using big dye terminator chemistry and chromatograms obtained 
with an Applied Biosystems ABI 3730XL capillary sequencer. Chromatograms were 
base-called by using Phred, assembled into contigs by using Phrap, and scanned for 
SNVs with PolyPhred* to identify polymorphic sites. Each read was trimmed to 
remove low-quality sequence (Phred score <25), resulting in analysed reads with 
an average Phred quality of 40. After assembly and variant calling, each polymorphic 
site was reviewed by a data analyst using Consed“ to ensure the quality and accuracy 
of the variant calls. This process generates sequence-based SNV genotypes with 
accuracy >99.9%. 

For the PROCARDIS study, a single long range PCR product (LRPCR) was 
amplified to provide coverage of the APOAS exonic, intronic and flanking sequences 
(human reference sequence NCBI build 37 chromosome 11:116,659,905-116,664,331). 
The LRPCR products were tagged with unique sequence (barcode) adaptors, and 
processed into 56 short amplicons (Reflex reactions, http://www.populationgenetics. 
com) and pooled for multiplex next-generation sequencing (NGS). NGS was per- 
formed ona MiSeq personal sequencer to >20X coverage across 95% of the APOAS 
target region on 1,385 MI cases and 1,499 controls. Paired-end reads were mapped 
to NCBI build 37 using the BWA and SMALT aligners; variants were identified by 
the GATK unified genotyper (v1.6.13) and annotated using SnpEff v2.0.5 and the 
GRCh37.64 database. 

For the CCHS/CIHDS study, lightscanner screening and re-sequencing were 
performed. Genomic DNA was isolated from frozen whole blood (QiaAmp4 DNA 
blood mini kit; QIAGEN, Hilden, Germany). Six PCR fragments were amplified 
covering the three coding exons and adjacent splice-sites (approximately 20 base 
pairs upstream and downstream each exon) of APOA5. Mutational analysis of the 
PCR products was performed by high-resolution melting curve (HRM) analysis 
using the Lightscanner system (Idaho Technology, Salt Lake City, Utah). PCR 
fragments showing heteroduplex formation by HRM analysis were subsequently 
sequenced on an ABI 3730 DNA analyser (Applied Biosystems, Foster City, CA). 
Data processing, quality control and association analysis. After sequencing, 
variants were annotated using SnpEff or Annovar**. For each study, only non- 
synonymous SNVs with MAF <1% were analysed. Rare variant burden testing was 
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performed using the T1 test. Meta-analysis was performed to combine evidence 
across study specific P values using the sample size weighted Z-score method, 
implemented in METAL. Association results and a listing of APOA5 mutations 
discovered from sequencing are described in Table 1 and Supplementary Table 10. 
P values for association between APOA5 mutation carrier status and lipid traits 
were performed using the Mann-Whitney ranksum test. Results are shown in Sup- 
plementary Table 12. 

Methods for follow-up exome sequencing 

Laboratory methods. We performed follow-up exome sequencing in additional 
samples from three other studies. Sequencing was performed at the Broad Institute, 
using the same protocols described above for the NHLBI ESP Project. 

Data processing, quality control and association analysis. Variant calling and 
annotations were performed as described above for the NHBLI ESP EOMI. Quality 
control of samples was performed using the following steps. To detect mismatched 
samples, we calculated discordance rates between genotypes from exome sequenc- 
ing with genotypes from array-based genotyping. We removed samples with dis- 
cordance rate > 0.02. We tested for sample contamination using verifyBamID*, 
which examines the proportion of non-reference bases at reference sites. We removed 
samples with FREEMIX or CHIPMIX scores >0.2. Furthermore, we removed out- 
lier samples with too many or too few SNVs (>700 or <5 singletons, >400 or <5 
doubletons, > 16,000 (>20,000 for African) or <10,000 total SNVs), and those with 
too high or low TS-TV (>4 or <3) and heterozygosity (heterozygote to homo- 
zygote non-reference ratio >6 or <2). Finally, we removed samples with high 
missingness (>0.1). In total, 202 samples were removed. For quality control of 
variants, we removed SNVs and indels that had low recalibration scores after run- 
ning GATK VariantRecalibrator. We also removed SNVs with low coverage (DP 
<140,000 and quality over depth (QD) <2) and high missingness (frequency of 
missing genotypes >0.02). For quality control of indels, we removed indels that 
had excessive strand bias (Fisher Strand >200), high proportion of alternate alleles 
seen near the ends of reads (ReadPosRankSum <—20), deviation from Hardy- 
Weinberg equilibrium (InbreedingCoeff <—0.8) and low coverage (QD <3). Rare 
variant association analysis was performed using EPACTS. We performed burden 
of rare variant analysis using the Efficient Mixed-Model Association eXpedited 
(EMMAX) Combined Multivariate and Collapsing (CMC) test’. This approach 
uses a kinship matrix to take into account population structure. We restricted ana- 
lyses to SNVs and indels with minor allele frequency <0.01. Furthermore, we re- 
stricted analyses to three different sets of variants: (1) non-synonymous only; (2) a 
deleterious set consisting of non-synonymous after excluding missense alleles anno- 
tated as benign by PolyPhen-2 HumDiv software; and (3) disruptive (nonsense, 
indel frameshift, splice-site) mutations only. 

Estimation of heritability explained by a burden of rare mutations in the 
APOAS and LDLR genes. We calculated the heritability explained by a burden 
of rare mutations in the APOA5 and LDLR genes using the following assumptions. 
We assumed that the alleles come from a mixture of two distributions: harmless 
alleles, with no effect on the trait, and null alleles, which destroy the function of the 
gene and have an (constant) effect on the trait. We assumed different values for the 
fraction of null alleles, « (our current expectation for most genes for « is around 
one-third to one-half for missense alleles and here we clump missense alleles together 
with nonsense alleles, which should slightly increase «). The variance explained is 
sensitive to this parameter. We assumed a liability-threshold model for disease, 
with an underlying (un-observed) continuous trait representing risk for MI, and 
MI occurring if risk is above a certain threshold. We assume all null alleles have 
effect B (in units of standard deviations) on the liability scale. We assumed different 
values for the prevalence (denoted «) for early MI (3% to 5%). Results are some- 
what sensitive to prevalence; higher prevalence will slightly increase heritability 
estimates. Given the prevalence, the number of carriers in cases and controls gives 
us the allele frequency in the population (which is very close to the allele frequency 
in controls). 

We fitted the effect size (f on liability scale) and alleles for different values of « 

and k. Results for APOA5 are shown in Supplementary Table 11 and results for 
LDLR are shown in Supplementary Table 19. For APOAS, B is moderate (up to 
roughly one standard deviation), with variance explained between 0.08% and 
0.17% of the total phenotypic variance (on the liability scale). If we assume the 
heritability of MI is 50%, a burden of rare mutations in the APOA5 gene may 
explain 0.16-0.34% of the heritability. For LDLR, for all values, variance explained 
is between 0.13% and 0.32% of the total phenotypic variance (on the liability scale) 
and 0.26-0.64% of the heritability. 
Sample size extrapolations and power calculations for burden of rare variants. 
We evaluated the sample size that is needed to reach genome-wide significance 
levels (P= 2.5 X 10 °) for the T1 test. Our calculations relied on the following 
assumptions. We assumed that all allelic variants with population frequency less 
than 1% are causal and have identical effect sizes. We also assumed that all alleles 
with frequency greater than 1% were benign. 
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Our calculations differentiate between the allele frequency of a SNV in our 
exome samples with its true allele frequency in a population. The T1 test compares 
the number of carriers of an allele for a SNV with sample (rather than population) 
allele frequency less than 1% among cases and controls. We considered three 
factors when extrapolating to larger sample sizes. First, we assumed our sample 
is comprised of 50% cases and 50% controls. As the prevalence of EOMI is esti- 
mated to be 5%, the sample frequency of causal alleles is likely to be higher than the 
population frequency. Second, some alleles with population frequency below 1% 
may, by chance, have sampling frequency greater than 1% and therefore be excluded 
from the test. Third, the true allele frequency of the SNVs in the population is 
unknown. In contrast to earlier work that relied on population genetics modelling”, 
we provide an update on the power needed to detect rare variant signal after con- 
sidering the three factors above. We calculated liberal and conservative estimates 
for our sample size extrapolations and power calculations. The conservative esti- 
mate was based on the estimate of the total population frequency of all causal 
alleles (below 1%) that would be unlikely to be excluded from the T1 test due to the 
sampling frequencies exceeding 1%. Because allele frequency distribution is domi- 
nated by rare alleles, for an allele with population frequency x, expected popu- 
lation allele frequency is smaller than x. 


(E(x|%) <%) (1) 


Therefore, the expected total population frequency of all alleles below frequency x 
is smaller than the total sampling frequency of alleles below sampling frequency x. 
However, setting x at 1% would result in a liberal rather than conservative estimate 
because alleles with population frequency below 1% may be excluded from the T1 
test as having sampling frequency above 1%. This occurs due to oversampling cases 
(our sample has 50% of cases at disease prevalence of 5%) and sampling variance. 
For example, assuming only one causal allele per gene, the power of the T1 test is 
maximal for the population allele frequency close to 0.5% for a sample of 1,000 
cases and 1,000 controls. For a sample of 10,000 individuals, the chance that a risk 
allele with population frequency of 0.5% would be excluded from the T1 test is 
below 107°, making this threshold even more conservative. Therefore, for a con- 
servative estimate, we have assumed that the total population frequency of all causal 
alleles per gene would equal the total sampling frequency of alleles below 0.5% in 
the ESP sample. Our liberal estimate assumed that all causal alleles will be included 
in the T1 test. We assumed that the total population frequency of all causal alleles 
per gene would equal the total sampling frequency of alleles below 1% in the ESP 
sample. 

Once we extrapolated the number of mutation carriers to 20,000 samples, we 
then performed power calculations to see how many samples would be needed to 
reach a genome-wide significance level for the T1 test (P = 2.5 X 10° after cor- 
recting for 20,000 genes). Power calculations were performed by first sampling a 
genotype at random from the pool of 20,000 simulated samples. Based on the T1 
carrier status of the drawn sample, we simulated the phenotype based on a calcu- 
lated probability. The phenotype was simulated based on a prevalence rate of 5% 
for disease, carrier status of the random sample and assumed relative risk of 2.0 of 
the mutation. For T1 carriers, the probability of being a case was calculated as rel- 
ative risk (RR) of T1 carrier multiplied by prevalence rate of disease (RR * prevalence 


rate). For non-carriers, the probability of being a case was simply the prevalence 
rate. The case-control ratio was 1:1. We performed sample size extrapolations for 
genes with varying number of T1 mutations (25th percentile, median and 75th per- 
centile of carriers with a T1 mutation for all genes discovered in the exome, Sup- 
plementary Figs 29-31). 
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MicroRNAsare short non-coding RNAs expressed in different tissue 
and cell types that suppress the expression of target genes. As such, 
microRNAsare critical cogs in numerous biological processes”, and 
dysregulated microRNA expression is correlated with many human 
diseases. Certain microRNAs, called oncomiRs, play a causal role in the 
onset and maintenance of cancer when overexpressed. Tumours that 
depend on these microRNAs are said to display oncomiR addiction’. 
Some of the most effective anticancer therapies target oncogenes such 
as EGFRand HER2; similarly, inhibition of oncomiRs using antisense 
oligomers (that is, antimiRs) is an evolving therapeutic strategy®’. How- 
ever, the in vivo efficacy of current antimiR technologies is hindered 
by physiological and cellular barriers to delivery into targeted cells*. 
Here we introduce a novel antimiR delivery platform that targets the 
acidic tumour microenvironment, evades systemic clearance by the 
liver, and facilitates cell entry via a non-endocytic pathway. We find 
that the attachment of peptide nucleic acid antimiRs to a peptide 
with a low pH-induced transmembrane structure (pHLIP) produces 
a novel construct that could target the tumour microenvironment, 
transport antimiRs across plasma membranes under acidic conditions 
such as those found in solid tumours (pH approximately 6), and effec- 
tively inhibit the miR-155 oncomiR in a mouse model of lymphoma. 
This study introduces a new model for using antimiRs as anti-cancer 
drugs, which can have broad impacts on the field of targeted drug 
delivery. 

Silencing aberrantly expressed microRNAs (miRNAs) in vivo has been 
achieved using antisense with various nucleic acid analogues involving 
locked nucleic acids (LNAs), 2’-O-methy] oligonucleotides (for exam- 
ple, antagomiRs), and peptide nucleic acids (PNAs) or nanoencapsulated 
PNAs>”"°. As with most RNA-based therapies, each of these strategies is 
stymied by non-specific organ biodistribution, reticuloendothelial sys- 
tem clearance, and endolysosomal trafficking*”’. Acidosis is a hallmark 
of tumours’’. The pHLIP peptide forms an inducible transmembrane 
ot-helix under acidic conditions!3, has the ability to translocate membrane- 
impermeable molecules into cells via a non-endocytic route’*"*, and, when 
administered systemically, can target a variety of epithelial tumours’. 
Exploiting acidity as a general property of the tumour microenvironment, 
we find that the pHLIP peptide can localize to tumours of lymphoid origin 
in a subcutaneous flank model (Fig. 1a) and a model of disseminated 
lymphadenopathy (Fig. 1b), while avoiding the liver. Although pHLIP also 
shows kidney targeting, much of the peptide is cleared by renal excretion 
(Extended Data Fig. 1). To exploit these targeting and delivery properties 
we developed a tumour-targeted antimiR delivery vector (pHLIP-antimiR). 

PNAs are nucleic acid analogues comprising nucleobases joined by 
intramolecular amide bonds. This backbone imparts stability, nuclease 
resistance, and an increased binding affinity for complementary nucleic 
acids'®. We hypothesized that pHLIP would facilitate the intracellular 
delivery of charge-neutral PNA antimiRs (Fig. 1c), which lack anionic 


phosphodiester groups, to cells within the tumour microenvironment. 
Tethering PNA antimiRs to pHLIP represents a unique approach because 
the multifunctional peptide component both targets tumours and medi- 
ates lipid membrane translocation”. 

Fabrication of pHLIP-antimiR was verified by reversed-phase high- 
performance liquid chromatography (RP-HPLC), tricine SDS—polyacryl- 
amide gel electrophoresis (SDS-PAGE), electrophoretic mobility shift 
assay (EMSA), and mass spectrometry (Extended Data Fig. 2a-c). In 
our constructs, the linkage between the PNA and peptide comprised a 
disulphide bond, which can be cleaved in the reducing environment of 
the cytosol (Fig. 1c)'’; therefore, attachment to the inserting carboxy (C) 
terminus of pHLIP promotes the intracellular delivery of the PNA antimiR. 
When administered to A549 cells (Fig. 2a and Extended Data Fig. 2d) 
and Toledo diffuse large-B cell lymphoma (DLBCL) cells (Extended Data 
Fig. 2e, f), which express elevated levels of miR-155 compared with other 
DLBCL cells’’, a pHLIP-antimiR modified with a 5-carboxytetrameth- 
ylrhodamine (TAMRA) label attached to the PNA resulted in enhanced 
cellular delivery at acidic extracellular pH compared with neutral pH. 
PNA delivery to cells by pHLIP does not appear to be greatly affected 
by sequence since antimiR uptake has been demonstrated with numer- 
ous miRNAs including miR-182 (Fig. 2a and Extended Data Fig. 2d), 
miR-155 (Extended Data Fig. 2e, f), scrambled miR-155, miR-21, and 
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Figure 1 | Targeting miR-155-addicted lymphoma using pHLIP. 

a, b, Targeting distribution of pHLIP labelled with Alexa Fluor 750 (A750- 
pHLIP) 36h after systemic administration to (a) a nude mouse with miR-155 
flank tumours (n = 3) and (b) a mir-155'°“"74 mouse with lymphadenopathy 
(n = 3); Alexa Fluor 750 conjugated to cysteine was the control. LN, lymph 
nodes. c, Schematic of pHLIP-mediated PNA antimiR delivery. (1) At pH less 
than 7, the C terminus of pHLIP inserts across lipid bilayers, which facilitates 
delivery of attached antimiR-155. (2) The disulphide between pHLIP and 
antimiR-155 is reduced in the cytosol. (3) Intracellular antimiR-155 is free to 
inhibit miR-155. 
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Figure 2 | Intracellular translocation of PNA antimiRs mediated by pHLIP. 
a, Confocal projections of A549 cells incubated with labelled pHLIP-antimiR 
(against control miR-182); scale bars, 25 um. Red, PNA~TAMRA,; blue, 
nucleus. b, Effects of miR-155 inhibition on KB cell viability; all data are 
normalized to cells treated with vehicle buffer. Data are shown as mean = s.d., 
with n = 3; statistical analysis performed with two-way analysis of variance 
(ANOVA); ***P < 0.001. 


miR-210. Delivery of antimiR-155 by pHLIP (pHLIP-antil55) de- 
repressed luciferase in miR-155-overexpressing”’ KB cells that stably 
expressed a miR-155-targeted dual luciferase sensor (Extended Data 
Fig. 2g). Additionally, inhibition of miR-155 by pHLIP-antil55 reduced 
KB cell viability at a dose comparable to LNA (15-base oligonucleotide, 
Exiqon) antimiR-155 delivered by lipofection (Fig. 2b). To demonstrate 
the adaptability of this antimiR delivery technology to silencing other 
miRNAs, pHLIP was attached toa PNA antimiR against miR-21, which 
de-repressed.a miR-21 luciferase sensor (Extended Data Fig. 2h). Together, 
these data suggest that pHLIP-antimiR is effective at delivering PNA 
antimiRs to multiple cancer cell types, in which endocytosis is hypoth- 
esized to be relegated to a supplementary mode of cell uptake due to 
the transport properties of pHLIP. 

Certain oncomiRs have emerged as pharmacological targets. For exam- 
ple, ectopic expression of miR-155 in mice provided the first evidence 
that dysregulation of a single miRNA could cause cancer”. Although 
aberrant expression of miR-155 is characteristic of numerous cancers, 
miR-155 is notorious for its oncogenic involvement in lymphomas”. 
We previously developed a Tet-Off-based mouse model in which miR- 
155 expression is induced in haematological tissues and can be attenu- 
ated with the addition of doxycycline (DOX)°. Between 2 and 3 months 
of age, these mir-155'S"'74 mice develop disseminated lymphoma, in 
which lymphoid tissues progress from normal histology, to follicular 
hyperplasia, to follicular lymphoma, to DLBCL (Extended Data Fig. 3a, b). 
Although these are aggressive cancers comprising neoplastic B cells with 
a high Ki-67 proliferative index, the disease dramatically regresses upon 
DOX-induced miR-155 withdrawal (Extended Data Fig. 3b, c). Therefore, 
this is a model of oncomiR addiction in which tumorigenesis is dependent 
on expression of miR-155 and its removal leads to cancer regression”. 

We assessed the therapeutic efficacy of pHLIP-antil55 in vivo using 
two tumour models based on mir-155"°"'"4 mice: (1) nude mice subcuta- 
neously implanted with neoplastic B cells derived from the enlarged spleens 
of mir-155'S""T4 mice (Extended Data Fig. 4a) and (2) mir-155'5""74 mice 
after progression to conspicuous lymphadenopathy (Extended Data 
Fig. 4b). Continuous suppression of miR-155 via DOX-impregnated 
mouse chow ora cocktail of chemotherapeutics and anti-inflammatory 
steroids (CHOP) served as positive controls that each caused tumour 
regression (Extended Data Fig. 5a). Since CHOP is part of the current stan- 
dard of care for human lymphomas”, the similar response to treatment 
with DOX and CHOP demonstrated the potential utility of antimiR- 
155 cancer therapy. Accordingly, intravenous administration of pHLIP- 
antil55 to the flank tumour model resulted in a significant reduction in 
tumour growth (Fig. 3a). In a subsequent study at a higher dose, pHLIP- 
antil55 showed a significant survival advantage compared with a com- 
mercially available LNA (Exiqon) antimiR optimized for in vivo miR-155 
silencing (Fig. 3b and Extended Data Fig. 5b). After administration of 
pHLIP-anti155, mice exhibited no clinical signs of distress, toxicity, and 
renal damage (Extended Data Fig. 5c). Note that the dose of pHLIP- 
antil55 used in this study was much lower (ranging from 17- to 40-fold) 
than that used in other antimiR delivery reports’? 
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Figure 3 | Targeted silencing of miR-155 has beneficial effects in mice with 
subcutaneous mir-155"°“™ tumours. a, Tumour growth response to 
treatment; arrows represent 1 mgkg' PNA dose per intravenous injection; all 
with n = 3, except for pHLIP-antil55 group with n = 4. b, Survival in response 
to antimiR treatment; cutoff criteria include tumour volume greater than 1 cm? 
or clinically mandated euthanasia. Symbols represent 2 (arrowhead) or 1 
(arrow) mgkg ' intravenous injections; LNA is a fully phosphorothioated 
LNA antimiR against miR-155; n = 4 for all groups; (*) for pHLIP-antil55 
compared with LNA. c, Representative histological analysis of livers 
(haematoxylin and eosin (H&E), X200 magnification) harvested from early 
endpoint study (Fig. 3a and Extended Data Fig. 5a). d, Mass range of spleens 
from mice in early endpoint study; all with n = 3, except for pHLIP-antil55 
group with n = 4. e, Time to development of conspicuous lymphadenopathy in 
survival study; (**) for pHLIP-antil55 compared with mock. Data are shown as 
mean + s.d.; statistical analysis performed with (a) two-way ANOVA, 

(b, e) Mantel—Cox test, or (d) two-tailed Student’s t-test; *P < 0.05; **P < 0.01; 
***D < 0.001. 


In addition to delaying tumour growth, pHLIP-antil55 treatment sup- 
pressed the metastatic spread of neoplastic lymphocytes to other organs. 
The liver, lymph nodes, and spleen were common targets for metastatic 
lymphocytes. In a blinded pathological assessment, livers from mice 
treated with pHLIP-antil55 and DOX had rare scattered aggregates of 
one to three neoplastic lymphocytes, while livers in the negative control 
groups typically had dense tumoral aggregates of up to two dozen cells 
scattered throughout the entire organ (Fig. 3c)—note that these tissues 
were harvested at an early endpoint (that is, when the negative controls 
reached a tumour size of 1 cm”, Fig. 3a) in relation to the survival study 
(Fig. 3b) to resolve pharmacological effects. Early endpoint treatment with 
pHLIP-antil55, DOX, and CHOP reduced the onset of splenomegaly 
(as judged by spleen mass), which occurred in all of the negative control 
groups (Fig. 3d). Additionally, pHLIP-antil 55 significantly delayed the 
development of conspicuous lymphadenopathy (Fig. 3e), which was par- 
ticularly evident in the inguinal and axillary lymph nodes throughout 
all of the groups (Extended Data Fig. 5d). 
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On the basis of a blinded complete blood count analysis, the nega- 
tive control groups comprised a large number of atypical mononuclear 
cells of lymphoid origin—consistent with the leukaemic phase of lym- 
phoma. Treatment with pHLIP-antil55 and DOX had levels of circu- 
lating lymphocytes similar to wild type, while CHOP treatment resulted 
inlymphocyte levels much lower than wild type (Extended Data Fig. 5e). 
Although pHLIP can target to metastasized lymph node tumours (Ex- 
tended Data Fig. 1c), the therapeutic effects on the levels of circulating 
lymphocytes suggest that the lower incidence of metastatic spread is 
probably due to antimiR activity at the primary tumour. These find- 
ings support the effective targeting of systemic antimiR-155 therapy to 
neoplastic cells (Extended Data Fig. 5f). The additional lymphopenia 
caused by CHOP treatment probably reflects the general toxicity of non- 
targeted conventional chemotherapy drugs (Extended Data Fig. 5e). The 
absence of systemic toxicity may represent an important advantage for 
pHLIP-targeted antimiR therapy. Importantly, when healthy C57BL/6 
mice were treated at the highest dose and frequency used in this study, 
pHLIP-antil55 showed no significant impairment of liver and kidney 
function (Extended Data Fig. 6a). Additionally, white blood cell levels 
(Extended Data Fig. 6b), body mass (Extended Data Fig. 6c), and organ 
mass (Extended Data Fig. 6d) were all within normal ranges. 

In addition to the miR-155-addicted lymphoma subcutaneous tumour 
model, pHLIP-antil55 was also effective at treating KB cell xenograft 
tumours, which stably expressed luciferase for intravital monitoring of 
tumour bioluminescence (Extended Data Fig. 7), as well as disseminated 
tumours in mir-155'°"'"* mice. Although implanted subcutaneous 
tumour models are effective for evaluating tumour growth, spontane- 
ous cancer models arising in endogenous tissues are a more clinically 
relevant means of assessing therapeutic efficacy. Remarkably, system- 
ically administered pHLIP-antil55 accumulated in the enlarged lymph 
nodes of the transgenic mir-155'*""™ mice (Fig. 4a). Furthermore, like 
most therapeutics, PNA oligomers are known to be cleared by the retic- 
uloendothelial system”, which results in accumulation in the liver; pHLIP- 
antil55 showed approximately 10-fold reduction in liver accumulation 
compared with antil55 alone (Fig. 4a and Extended Data Fig. 8a, b). 
The therapeutic impact of pHLIP-antil55 in mir-155'"'"" mice was 
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supported by a statistically significant decrease in spleen size and a non- 
statistically significant reduction in lymph node tumour burden (Fig. 4b 
and Extended Data Fig. 8c-e). A non-statistically significant increase in 
apoptosis was also observed in the lymph nodes of treated mice (Extended 
Data Fig. 8f, g). Interestingly, blinded histopathological analysis revealed 
that spleens in pHLIP-antil 55-treated mir-155'°"""* mice had differen- 
tiated red and white pulp (similar to wild-type mice with no treatment), 
while the splenic architecture of mir-155'°"""“ mice treated with pHLIP- 
antiscr was almost completely effaced (Fig. 4c and Extended Data Fig. 8h). 
As with the subcutaneous tumour studies, pHLIP-antil55 treatment also 
showed a 12-fold reduction in liver metastasis (Fig. 4d and Extended Data 
Fig. 8i, }), while flow cytometric analysis revealed reductions in popula- 
tions of B220-expressing spleen cells (Extended Data Fig. 8k). Consistent 
with a lack of systemic toxicity, treatment with pHLIP-antil55 produced 
no histopathological kidney damage (Extended Data Fig. 81). Lastly, all 
mice that developed lymphoma-induced paresis showed improved motor 
skills after pHLIP-antil55 treatment (Supplementary Videos 1-4, n = 3). 

For a more direct assessment of miR-155 silencing in mir-155'°""™ 
mice, we monitored the levels of miR-155 targets in response to antimiR 
treatment. As an oncogene in lymphoma, miR-155 suppresses genes 
involved in processes such as apoptosis, proliferation, immune response 
regulation, as well as cell differentiation and development”'. However, 
the addiction mechanisms by which lymphoma regresses upon miR-155 
withdrawal are unknown. Typically, miR-155 targets have been identified 
by differential gene expression analysis of an overexpression condition 
compared with wild type’. To uncover the genes required for miR-155 
addiction, we performed RNA sequencing (RNA-seq) analysis on miR- 
155-addicted lymphoid tumours compared with regressing tumours 
(Extended Data Fig. 9a). This is the first study, to our knowledge, to iden- 
tify miRNA cancer targets that directly result from oncomiR withdrawal. 
Out of 29,209 mouse genes, 2,101 showed significant upregulation or 
downregulation in response to miR-155 attenuation (Extended Data 
Fig. 9b and Supplementary Table 1). Kyoto Encyclopedia of Genes and 
Genomes (KEGG) analysis of upregulated genes revealed that 41% have 
been associated with cancer pathways (Extended Data Fig. 9c). Addition- 
ally, 25% have been implicated in cell adhesion and migration pathways 
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lymphadenopathy. a, Confocal projections of systemic, tumour-targeted 
delivery of antimiR-155 to mir-1 55ISLITA ice using pHLIP; scale bars, 25 1m 
(top, enlarged cervical lymph node) and 250 um (bottom, liver), n = 3. Red, 
PNA-TAMRA; blue, nucleus. b, Representative mir-155'S“'T4 mouse before 
and after treatment with pHLIP-antil55, n = 6. c, d, Representative H&E 


analysis of (c) spleens and (d) livers harvested from diseased littermate 


mice with no treatment. e, Heat map showing selected upregulated genes upon 
miR-155 withdrawal. f, Quantitative PCR (qPCR) determination of gene 
expression levels in lymphoid tissue from mir-155'S"™ mice. Data are shown 
as mean = s.d., n = 3; statistical analysis performed with two-tailed Student’s 
t-test; *P < 0.05; **P<0.01. 
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such as leukocyte transendothelial migration. We compared the upre- 
gulated genes to known and putative miR-155 targets (Supplementary 
Table 2) identified using the miRWalk target prediction algorithm”. 
At the intersection of these screens, several genes are known to have 
tumour suppressor characteristics (Fig. 4e, Extended Data Fig. 9d and 
Supplementary Table 3). One notable gene is Bach1, a transcription factor 
that has been validated as a miR-155 target in renal cancer and cultured 
B cells**’’. Gene expression analysis was used to validate Bach1 as a miR- 
155 target in Toledo cells treated with pHLIP-antil55 (Extended Data 
Fig. 9e) and in mir-155'*'‘" mice undergoing DOX-induced miR-155 
withdrawal (Extended Data Fig. 10). Furthermore, diseased mir-155°S''TA 
mice treated with pHLIP-antil55 showed an increase in Bach1 levels in 
cancerous axillary, cervical, and inguinal lymph nodes (Fig. 4f). A known 
miR-155 target in lymphoma, Mafb™, was also upregulated in response 
to pHLIP-antil55 treatment (Fig. 4f). Therefore, pHLIP-antil55 can 
target lymph node neoplasms and cause effective blockage of miR-155 
activity. 

While oncomiRs are proving to be potent anticancer targets, in the- 
ory, using this approach, every miRNA is a ‘druggable’ target. Through 
targeted antagonism of miRNAs, pHLIP-antimiR has vast therapeutic 
potential for cancer and many other pathological conditions that produce 
localized acidic environments such as ischaemia, myocardial infarcts, 
stroke, tissue trauma, and sites of inflammation and infection. The main 
limitation of this transmembrane delivery approach involves the need 
for the drug cargo to have limited charge, such as PNA antimiRs. While 
other antimiR delivery and targeting strategies have been described”, 
utilization of pHLIP to target the acidic tumour microenvironment is a 
widely applicable technology that will present new therapeutic and mech- 
anistic opportunities for effective targeting of miRNA silencing. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


PNA synthesis. Regular Boc-protected PNA monomers were purchased from ASM 
Research Chemicals. All the given oligomers were synthesized on solid-support using 
standard Boc chemistry procedures”. The oligomers were cleaved from the resin 
using m-cresol:thioanisole:TFMSA:TFA (1:1:2:6) cocktail solution. The resulting 
mixtures were precipitated with ether (three times), purified, and characterized by 
RP-HPLC and MALDI-TOF, respectively. All PNA stock solutions were prepared 
using nanopure water and the concentrations were determined at 90 °C ona Cary 3 
Bio spectrophotometer using the following extinction coefficients: 13,700 M-'cm™* 
(A), 6,600 M-'cm™1(C), 11,700M'cm™~'(G),and 8,600 M_ ‘cm! (T). The 23- 
base oligonucleotide PNA oligomer complementary to miR-155 has an estimated 
melting temperature (Ti) of 77.8 °C. Single-isomer 5-carboxytetramethylrhoda- 
mine (TAMRA) purchased from VWR was exclusively conjugated to the amino 
(N) terminus of PNAs with a hydrophilic bifunctional linker, Boc-miniPEG-3'™™ 
(11-amino-3,6,9-trioxaundecanoic acid, DCHA, denoted in the sequences by -oo0-) 
purchased from Peptide International. Cysteine was also conjugated to the C ter- 
minus of PNAs using a Boc-miniPEG-3 linker. 

The following PNA antimiR sequences were used: antil55, TAMRA-oo0-ACC 
CCTATCACAATTAGCATTAA-o00-Cys; antiscr, TAMRA-oo0-ACCCAATC 
GTCAAATTCCATATA-000-Cys; anti21, TAMRA-o00-TCAACATCAGTCTG 
ATAAGCTA-0oo0-Cys; antil82, TAMRA-o00-CGGTGTGAGTTCTACCATTG 
CCAAA-o00-Cys. 

Full-length PNA antimiRs were used throughout this study. While current tech- 
nologies such as ‘tiny’ LNAs have seen efficacy with miRNA seed-targeted 8-base 
oligonucleotide antimiRs*’, truncated PNA antimiRs should be similarly effective 
owing to their high binding affinity, which can be further enhanced with chemical 
modifications”. 

Synthesis and characterization of pHLIP-antimiR. To generate pHLIP-antimiR 
constructs, the following pHLIP sequence (New England Peptide) was synthesized: 
AAEQNPIYWARYADWLFTTPLLLLDLALLVDADEGT(CNPys)G; conjugation 
of the C terminus to thiolated-PNA was facilitated by incorporating a cysteine group 
derivatized with 3-nitro-2-pyridinesulphenyl (NPys). To synthesize pHLIP-antimiR 
constructs, pHLIP-Cys(NPys) and antimiR PNA (peptide:PNA 1:1.3) were reacted 
overnight in the dark in a mixture of DMSO/DMEF/0.1 mM KH>PO, pH 4.5 (v/v 
3:1:1) under argon. Note that this protocol was adapted from a general method 
of conjugating peptides to PNAs. Aside from pHLIP, attaching molecules, such as 
cell-penetrating peptides, to PNAs can increase cellular uptake and in vivo delivery 
efficacy****. However, these conjugates typically require high doses and distribute 
to tissues throughout the body, which can result in off-target effects''”*. Similarly, 
pHLIP can be attached to other antimiR compositions (such as LNA), which would 
probably improve tumour targeting; however, physicochemical properties of PNA 
make them more amenable to pHLIP-mediated membrane translocation. A750- 
PHLIP was fabricated as previously described’’. 

Purification and verification of pHLIP-antimiR. After conjugation, pHLIP-antimiR 
was purified by RP-HPLC (Shimadzu) using a C18 column and a mobile phase 
gradient of water and acetonitrile with 0.1% trifluoroacetic acid. Purified pHLIP- 
antimiR was further characterized using matrix-assisted laser desorption/ionization- 
time of flight (MALDI-TOF). Concentrations of pHLIP-antimiR were determined 
ona Nanodrop Spectrophotometer (Thermo Scientific) at 260 nm corrected for pep- 
tide and TAMRA absorbance. Gelshift analysis used a 20% TBE gel and Bolt elec- 
trophoresis system (Life Technologies); before loading, samples were incubated 
with an equimolar amount of miR-155, denatured at 95 °C for 2 min, and allowed 
to anneal at 37 °C for 30 min. SYBR Gold (Life Technologies) was used to visualize 
miR-155; pHLIP and free PNA were not detected by the stain. Tricine SDS-PAGE 
used a 16% tricine gel (Life Technologies) and standard SDS-PAGE procedures. 
Samples were visualized first using TAMRA fluorescence on a Maestro 2 Multispectral 
Imaging System (PerkinElmer), and then using Simply Blue Coomassie stain (Life 
Technologies). For disulphide reduction studies, pHLIP-antimiR was reduced for 
30 min in 200 mM DTT for HPLC and EMSA, and 5 mM TCEP for tricine SDS- 
PAGE. For all in vitro and in vivo studies, pHLIP-antimiR was heated at 65 °C for 
10 min to prevent aggregation. 

Animals. All mice were maintained at Yale University in accordance with Yale 
Animal Resource Center and the Institutional Animal Care and Use Committee guide- 
lines. The mir-155°"™ mice were generated as previously described’. For transplant 
studies, 5- to 6-week-old female CrTac:NCr-Foxn1 nude mice (Taconic) were used. 
For toxicology studies, 8- to 9-week-old female C57BL/6] mice (Jackson) were used. 
For treatment of mir-155'°""™ mice, a sample size of at least four was appropriate 
on the basis of post hoc power analysis using quantitation of spleen size (Extended 
Data Fig. 8d) with a 95% confidence interval. For all animal studies, group allocations 
were randomized and all pathological analyses were blinded to treatment groups and 
expected experimental outcomes. 

Cell culture. For all pH-controlled cell culture experiments, cells (previously tested 
for mycoplasma and supplied from ATCC) were incubated with 10% FBS in RPMI 
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buffered at pH 7.4 with HEPES or pH 6.2 with MES, and treated with pHLIP- 
antimiR suspended in reaction buffer which constituted no more than 1% of the 
final volume. 
Histology and other techniques. Harvested tissues were fixed in 10% formalin and 
processed by Yale Pathology Tissue Services for H&E and terminal deoxynucleotidyl- 
transferase-mediated dUTP nick end labelling (TUNEL) staining. Retro-orbitally 
collected whole blood preserved in EDTA or serum separated using lithium hep- 
arin was sent to Antech Diagnostics for complete blood count or clinical chemistry 
analyses, respectively. Image quantification used ImageJ version 1.47 (NIH) and 
Colour Deconvolution plugin (A. C. Ruifrok). Intravital and ex vivo fluorescence 
imaging was performed on either an IVIS Spectrum System (Caliper) or Maestro 2 
Multispectral Imaging System using near-infrared or TAMRA filter sets. Live mice 
were anaesthetized using isoflurane during image acquisition. For whole-organ studies, 
organs were harvested and fixed in 10% formalin before imaging. 
Flank tumour establishment. To establish mir-155'*““™ lymphoma subcutaneous 
flank tumours, first enlarged spleens were extracted from 2- to 3-month-old mir- 
155'S“™4 mice with obvious lymphadenopathy (which generally correlated with inci- 
dence of splenomegaly). Using a 100 jum pore-size cell strainer technique, spleen 
tissue was dispersed into a single cell suspension in 5% FBS in PBS on ice. Red blood 
cells were lysed using ammonium chloride lysis buffer (Stem Cell Technologies), 
and 5 X 10° cells were subcutaneously injected into nude mice. Tumours were gen- 
erally palpable within 10 days; tumour volume was calculated as (length x width*)/2. 
For bioluminescent xenograft tumours, KB cells were stably transfected with firefly 
luciferase and clonally selected via hygromycin B selection; 5 X 10° cells were sub- 
cutaneously injected into nude mice to establish tumours. RediJect D-Luciferin Ultra 
Bioluminescent Substrate (PerkinElmer) was administered via the manufacturer’s 
protocol for intravital monitoring of tumour bioluminescence using IVIS Spectrum 
(Caliper). It was pre-established that, for all flank tumour studies, animals were 
excluded if their tumours had not reached a volume of 50-100 cm? by the time of 
treatment. Animals were randomized into experimental arms by minimizing the 
differences in mean tumour size and standard deviation. 
Confocal imaging and flow cytometry. For fixed cell confocal preparation, after 
treatment for 1h at with 5 uM of pHLIP-antil55 (Fig. 2a), cells were washed with 
1% BSA in PBS, fixed in 4% paraformaldehyde, and permeabilized using 0.1% 
Triton X-100 in PBS. All washes were performed using PBS at pH 7.4 to wash away 
surface-bound pHLIP. Actin and nuclei were stained with Texas Red-X phalloidin 
(Life Technologies) and Hoechst 33342 (Life Technologies), respectively. Cells were 
mounted in Slow Fade Gold (Invitrogen). Alternatively, Toledo cells were treated 
with 500 nM pHLIP-antil55 (Extended Data Fig. 2e), washed with 1% BSA in PBS, 
and imaged live without fixation or permeabilization. For tumour and liver tissues, 
organs were harvested and fixed in 10% formalin, then incubated overnight in 30% 
sucrose in PBS. Tumours were flash frozen in OCT before being slicing into 10 jum 
sections, permeabilized, stained, and mounted in Vectashield (Vector Labs). Cell 
and tissue confocal imaging used a TCS SP5 Spectral Confocal Microscope (Leica); 
confocal projections were constructed using LAS AF software (Leica) with 0.9-j1m- 
stack height. For live cell flow cytometry, after 48 h of treatment, cells were washed 
five times with 1% BSA in PBS on ice and then analysed on a FACScan (BD Biosci- 
ences) using FlowJo software (Tree Star); for B220 studies, freshly harvested spleen 
cells (see section on Flank tumour establishment) were blocked with 10% FBS (20 min), 
stained with Alexa488-anti-CD45R/B220 (BD clone RA3-6B2, 20 min incubation 
at room temperature at 1 pg m1 ~' concentration), washed three times with PBS on 
ice, and transferred to 1% BSA 0.1% NaN; in PBS on ice before analysis. 
Luciferase reporter and cell viability. For dual luciferase reporter experiments, 
the miRNA sensor was generated by inserting the target sequence for miR-155 into 
the 3’ untranslated region of Renilla luciferase on a psiCHECK-2 vector (Promega). 
KB cells were stably transfected using Lipofectamine 2000 (Life Technologies) and 
co-transfection with a Linear Hygromycin Marker (Clontech) followed by clonal 
selection. Utilization of stable clones was more reliable than transiently transfected 
cells for antimiR studies. Cell lysates were measured for luciferase activity 48 h after 
treatment using the Dual-Luciferase Reporter Assay System (Promega). Control 
LNA antimiR-155 (Exiqon) was delivered by lipofectamine RNAiMAX. Optimal 
sensor activity was seen at a 500 nM dose, although inhibition of miR-155 was also 
observed at lower doses. For analysis of miR-21 inhibition A549 cells were similarly 
treated with pHLIP-anti21 and relevant controls; however, the cells were instead trans- 
fected with a miR-21-specific LightSwitch miRNA Target GoClone Luciferase Reporter 
(Active Motif). Cell viability was measured 96h after treatment using CellTiter- 
Glo (Promega). For both luciferase and viability assays, all treatments were per- 
formed at the indicated pH for 24h, then media was replaced with 10% FBS in 
RPMI at physiological pH for extended incubation. 
qPCR. For qPCR analysis of tissue after treatment with two 2 mg kg”! injections of 
pHLIP-antil55 or pHLIP-antiscr spaced 48 h apart, tissues were harvested 24h after 
the last injection and divided into at least five representative 1 mg slices. Tissue slices 
were pooled into Trizol (Life Technologies) and homogenized using a Precellys 24 
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Homogenizer. As per the manufacturer’s protocol, chloroform was added to facil- 
itate phase separation, and the RNA-containing aqueous phase was collected. An 
equal volume of 200 proof ethanol was added, and RNA was purified from this 
mixture using RNeasy kit (Qiagen) and standard procedures with on-column DNase 
I digestion; standard RNeasy purification was followed for RNA extraction from 
cells. Reverse transcription PCR was performed with 1 1g total RNA and poly-A 
based iScript cDNA Synthesis Kit (Bio-Rad). Real-time PCR was performed with 
Quantitect Primer Assays (Qiagen) and iQ SYBR Green Supermix (Bio-Rad) using 
a Roche LightCycler 480 System; all samples were normalized to B-actin. 

RNA-seq. For RNA-seq analysis, the overexpression and withdrawal groups both 
consisted of three mice with subcutaneous tumours that were established from 
enlarged spleens of diseased donor mir-155"S"'™ mice (Extended Data Fig. 9a). The 
overexpression and withdrawal mice were paired such that each of the three pairs 
was from a separate donor littermate. Tissue was harvested once tumours reached 
a volume of 1 cm’; for mice in the miR-155 withdrawal group, DOX was adminis- 
tered for 16 h before tissue collection. As described in the transplant methods, tumour 
tissue was dispersed into a single cell suspension and red blood cells were lysed. 
Total RNA was extracted from the remaining cells using the hybrid Trizol and 
RNeasy protocol described in the qPCR methods. High-quality total RNA (Agilent 
2100 Bioanalyzer RIN value greater than 7) was sent to Expression Analysis for 
library preparation, Illumina TruSeq mRNA sequencing (50-base-pair paired end, 
25 million reads per sample), alignment to the mouse genome (greater than 80% 
aligned to the NCBI37/mm49 assembly), and counts of the number of gene-mapped 
fragments given the maximum likelihood abundances. DESeq was used first to esti- 
mate size factors (that is, normalize samples by their respective sizes) and dispersions 
(that is, variance between samples), and then to identify differentially expressed genes 


(Supplementary Table 1). Heat maps were generated using variance stabilizing 
transformations of the count data on the basis of a parametric fit to the overall mean 
dispersions. 

KEGG analysis. The Database for Annotation, Visualization and Integrated Dis- 
covery (DAVID, http://david.abcc.ncifcrf.gov) was used to identify the KEGG path- 
ways that were enriched in the genes and both upregulated in response to miR-155 
withdrawal and having a false discovery rate less than 0.05. Enriched KEGG path- 
ways had a minimum count threshold of 2 and a modified Fisher’s exact P value for 
gene enrichment less than 0.05. 
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Extended Data Figure 1 | Distribution of pHLIP to the renal system and 
lymph node metastases. a, Intravenous injection of A750-pHLIP distributes to 
the (white arrow) kidneys and (blue arrow) tumour in a representative 
mir-155'S“'T4 subcutaneous flank model (n = 3); time points indicate hours 
after a single injection of A750-pHLIP. Previous reports have observed systemic 
distribution of pHLIP to kidneys in other mouse models'”. Similarly, we 
speculate that the increased uptake of pHLIP peptide in the kidneys is due to 
excretion and increased acidity of renal tubule cells. Initially kidneys are highly 
enriched for pHLIP, which is gradually excreted while pHLIP shows a more 
steady accumulation in the tumour. b, Representative example showing 
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A750-pHLIP distribution to the (white arrow) bladder and (yellow arrow) 
enlarged axillary lymph node 36h after intravenous administration into 
mir-155°°""7* mice with lymphadenopathy (n = 3). c, In addition to 
distributing to the (white arrow) primary mir-155'°"™ flank tumour and 
(red arrow) kidneys, A750-pHLIP distributes to (black arrows) enlarged 
lymph nodes that resulted from metastatic spread; intravital fluorescence of 
A750-pHLIP was detected 48 h after intravenous injection into nude transplant 
mice with conspicuous lymphadenopathy (a representative animal from n = 3 
is shown). 
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Extended Data Figure 2 | Assessment of pHLIP-PNA conjugation and 
activity. a, HPLC elution profiles of (top) free PNA, (middle) reaction mixture 
of PNA and pHLIP-C(Npys), and (bottom) purified pHLIP-PNA incubated 
in DTT. HPLC was used to purify pHLIP-PNA (black arrow). The fluorescence 
detection of TAMRA (ex/em: 540/575) that was conjugated to the PNA is 
shown; samples were also detected by absorbance at 260 and 280 nm (data not 
shown). b, Tricine SDS-PAGE evaluation of pHLIP-PNA conjugation. Gel 
was visualized by (top) TAMRA fluorescence to detect labelled PNA and 
(bottom) Coomassie stain to detect both PNA and peptide. c, Gelshift analysis 
of pHLIP-antimiR-155 binding to miR-155 and disulphide reduction in the 
presence of DTT. d, High-magnification confocal projections of A549 cells 
incubated with labelled pHLIP-antimiR (against control miR-182); scale bars, 
7.5 jum. The diffuse intracellular fluorescence is indicative of freely distributed 


antimiR throughout the cytosol—note that the presence of marginal punctate 
fluorescence at both pH levels suggests that endocytosis is probably an 
additional mode of cell entry. e, Toledo DLBCL lymphocytes were incubated 
with labelled pHLIP-antil55 at pH 6.2; fluorescence of a representative live cell 
is overlaid on a bright field micrograph; scale bars, 2 ,tm. f, Flow cytometry 
analysis of Toledo cells incubated with labelled pHLIP-anti155; cell association 
was dependent on dose (top, pH 6.2) and pH (bottom, 500 nM dose). 

g, Inhibition of miR-155 demonstrated by de-repression of a miR-155 dual 
luciferase sensor in KB cells. h, Inhibition of miR-21 demonstrated by 
de-suppression of luciferase expression in A549 cells transfected with a Renilla 
luciferase sensor. Data are shown as mean + s.d., with n = 3; statistical analysis 
performed with two-tailed Student’s t-test; **P < 0.01; ***P < 0.001. 
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Extended Data Figure 3 | Pathology of the mir-155'°"™ model of oncomiR 
addiction. a, Organomegaly in representative diseased mir-155'°“"™ mice: 
top, conspicuous lymphadenopathy seen in the (black arrow) cervical and 
(white arrow) brachial lymph nodes; middle, enlarged exposed (white arrows) 
cervical and (black arrows) axillary lymph nodes; bottom, enlarged (black 
arrows) spleen. b, Histopathology of mir-155'°"'™ mice: H&E stain of an 
enlarged spleen shows expansion of the white pulp by a nodular, neoplastic 
infiltrate; staining of the spleen shows CD20* and CD10" B cells of follicular 
centre origin. Analysis of enlarged lymph nodes indicates DLBCL with lymph 
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node architecture effaced by a confluent population of B220* neoplastic 
lymphocytes and a Ki-67 proliferative index at nearly 100%; n = 5. c, Tumour 
regression due to DOX-induced miR-155 withdrawal in a subcutaneous 
tumour model established from transplanted splenic mir-155°S"'™ 
lymphocytes; time points indicate hours after initial administration of DOX. 
With a cancer phenotype that is relevant to human disease yet can be 
modulated by miRNA silencing, this is an excellent model for evaluating 
miR-155-targeted therapies. 
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Extended Data Figure 4 | Experimental schematics for mouse tumour 
studies. a, Workflow for treatment of the mir-155'*“"™ subcutaneous flank 
model for the early endpoint and survival studies; day 1 indicates time of 
first injection. For the ‘early treatment’ experiments in Fig. 3a, b, d-f, h 

and Extended Data Fig. 5b, c, mice were treated on days 1 and 2 with 
pHLIP-antil55, mock buffer, pHLIP-antiscr and antil55 only; fed DOX 
starting on day 3; or treated with CHOP regimen on days 2-4. For survival 
experiments in Fig. 3c, g and Extended Data Fig. 5a, mice were treated on days 


Biodistribution 
(endpoint) 


Therapeutic 
Analysis 
(endpoint) 


Day 3 Day 4 


1-3 with pHLIP-anti155, LNA against miR-155, and mock buffer. b, Workflow 
for investigation of the mir-155'°"™" model of lymphoma for the 
biodistribution and miR-155 silencing studies. For experiments in Fig. 4a and 
Extended Data Fig. 8a, b, mice were treated on day 1 with pHLIP-antil55, 
antil55 only, and mock buffer. For experiments in Fig. 4b-d, h and Extended 
Data Fig. 8c-g, mice were treated on day 1 and day 3 with pHLIP-anti155, 
pHLIP-antiscr, and mock buffer, or fed DOX 16h before harvest. 
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Extended Data Figure 5 | Administration of pHLIP-antil55 to mice with 
subcutaneous lymphoma flank tumours. a, Fold change in tumour size in 
response to miR-155 withdrawal and CHOP treatment (n = 3); arrow 
represents initiation of DOX treatment (n = 3, food pellets enriched with DOX 
at 2.3 g/kg, Bio-Serv), white triangle represents CHOP treatment (systemic 
injection of cyclophosphamide at 40 mg/kg, doxorubicin at 3.3 mg/kg, and 
vincristine at 0.5 mg/kg; oral gavage of prednisone at 0.2 mg/kg), grey triangles 
represent maintenance administration of prednisone. b, Tumour growth 
response to systemically administered antimiR treatment; symbols represent 
intravenous injections of 2 (arrowhead) or 1 (arrow) mgkg | of pHLIP- 
conjugated antimiR-155, molar equivalent of phosphorothioated antimiR-155 
LNA, or mock delivery solution; n = 5, data are shown as mean + s.e.m.; 
statistical comparison of pHLIP-antil55 to LNA performed with two-way 
ANOVA; ***P < 0.001, ****P < 0.0001. c, Representative histological 
analysis of kidneys (H&E, X 100 magnification) harvested from early endpoint 
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study, in which all of the mice from Fig. 3a and Extended Data Fig. 5a were 
killed at the same time for analysis. Kidney sections reveal an absence of 
microscopic changes in treated animals (pHLIP-antil55) that would be 
indicative of renal toxicity (compare with normal renal sections in mock 
control). d, Representative pHLIP-antiscr-treated mouse (top) with primary 
flank tumour (white arrow) and enlarged inguinal lymph node (black arrow) 
compared with an untreated mouse with no tumour (bottom). e, Measurement 
of circulating lymphocytes in blood collected at time of death in early endpoint 
study; dotted line denotes average level in nude mice with no tumour. 

f, Although pHLIP interacts with the outer leaflet of lipid membranes, no 
significant change in red blood cell (RBC) levels was detected after intravenous 
treatment of mice with subcutaneous mir-155"°“"™ transplant tumours. This 
supports the specificity of pHLIP treatments on cells of tumour origin since 
pHLIP-antimiR treatments affect the levels of circulating lymphocytes 
(Extended Data Fig. 5e); data are shown as mean + s.d. 
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Extended Data Figure 6 | Toxicology assessment of intravenously (10 days before start of treatment), as well as 1 day and 14 days after treatment. 
administered pHLIP-antil155 to C57BL/6J mice. a, Serum-based clinical b, Circulating white blood cell count collected 14 days after treatment. 
chemistry evaluation of systemic toxicity with focus on liver and kidney c, Mouse mass throughout duration of the study. d, Organ mass normalized 
function; dosing schedule consisted of injections of 2mgkg * of pHLIP- to total body mass at time of harvest. a—d, For all analyses mock n = 4, 
antil55 (and equimolar dose of LNA) on days 10 and 12, as well as 1mgkg * pHLIP-antil55 n = 5, and LNA n = 5; dotted lines indicate typical wild-type 
on day 11. Blood samples were serially harvested retro-orbitally on day 0 values for C57BL/6] mice. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a b 
ea vy ¥Y VY P= 0.0067 204 _©- mock 
NS 7 MP Lna 
no 
= 5 1575 fb pHLIP-anti155 
2 ‘ E 
: ! : 
® o5 H P= 0.0169 = 
c 05-7 --, — o 
; 2 
ra) = Mock f) o 
s 1 5 
| b srererete aie 
irs —— LNA ; = 
— pHLIP-anti155 i L 
' 
re 0 BR reese eer 
0 10 20 30 40 50 0 5 10 15 20 25 30 35 
Time (days) Time (days) 
c [ Day 8 d pHLIP-anti155 600 
6x105 Hl Day 13 
| HB Day 24 © 500 
e 
8 4x10° = 400 
qs | 
fe) 
; a& A : 
£ 1 : 5 
= 300 & 
3 é ; 
O 2x10°+ + : 
ri} N a 
> a 200 
1 a 
0 ~ a 100 


mock LNA pHLIP-anti155 


Extended Data Figure 7 | Administration of pHLIP-antil55 to mice with mock-treated mice with ulcerated tumours. b, Fold change in tumour size in 


KB oral squamous cell carcinoma xenograft tumours. a, Intravenous response to treatment; measurements were plotted until the mock negative 
injection of pHLIP-anti155 (**) and phosphorothioated LNA against miR-155 _ control group was euthanized. c, Tumour bioluminescence in response to 
(*) significantly enhanced survival compared with mock buffer treatment; treatment; day 8 represents luciferase activity before first injection. 

n = 4 for all groups; arrowheads indicate injections of 2mgkg ' (or molar d, Representative images of tumour bioluminescence. Data are shown as 
equivalent for LNA). Survival cutoff criteria included tumour volume greater mean + s.e.m,; statistical analysis performed with (a) Mantel-Cox analysis 
than 1 cm? or compassionate euthanasia, which was mandated for three or (c) two-tailed Student’s t-test, *P < 0.05; **P< 0.01. 
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Extended Data Figure 8 | Administration of pHLIP-anti155 to mir- 
155'S"'T4 mice with lymphoma. a, Quantification of liver distribution of 
TAMRA-labelled PNA delivered with and without conjugation to pHLIP; 
Image] was used to measure fluorescence from five confocal sections per mouse 
liver; n = 3 mice per group. b, Visualization of whole liver fluorescence after 
antimiR administration; pHLIP-antil55 liver fluorescence is similar to the 
autofluorescence seen in the mock group. c, Lymph-node tumour burden 

(A, axillary; B, brachial; C, cervical; I, inguinal lymph nodes); in these specific 
images taken from diseased littermates, pHLIP-antiscr-treated mice had a more 
than threefold larger aggregate lymph node mass (3.179 g) than pHLIP- 
antil55-treated mice (1.006 g). d, e, Size of harvested (d) spleens (n = 4) and 
(e) lymph nodes (axillary, brachial, cervical, and inguinal; n = 5) with respect to 
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wild type; n < 6 (that is, total number of treated mice) owing to size data 

not collected. f, g, TUNEL analysis of treated cervical lymph nodes of 
mir-155'S"™ mice (n = 6). h, Percentage of white pulp in treated spleens; 
n= 6. i, Measurement of lymphocyte infiltration into liver; n = 6. j, Low- 
magnification H&E images of livers from Fig. 4d. k, Flow cytometry analysis 
of B220-positive cells comprising the spleens of treated mice; B220 is typically a 
marker for B cells, although varied expression is seen on some T cells, natural 
killer cells, and macrophages; n = 4. 1, Representative H&E image of healthy 
kidneys from pHLIP-antil55-treated mice; n = 6. Data are shown as 

mean + s.d. (a, d, e, g, h) or mean = s.e.m. (i); statistical analysis performed 
with two-tailed Student’s t-test; **P < 0.01; ****P < 0.0001. 
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Extended Data Figure 9 | Differential gene expression analysis of miR-155 
withdrawal. a, Experimental design for RNA-seq analysis of miR-155- 
addicted tumours compared with tumours undergoing miR-155 withdrawal 
and tumour regression. b, RNA-seq differential gene expression analysis of 
three independent tumours overexpressing miR-155 compared with three 
independent tumours undergoing DOX-induced miR-155 withdrawal; all 
differentially expressed genes with a false discovery rate less than 0.05 are 
shown; rows are clustered by Euclidean distance measure. c, KEGG pathway 
analysis of significantly upregulated genes after miR-155 withdrawal. 

d, Selection of potential miR-155 targets involved in tumour regression. 
Intersection of genes (group I) that are both predicted miR-155 targets 
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(Supplementary Table 2) and overexpressed after miR-155 withdrawal from 
mir-155'*"""* tumours (Supplementary Table 1) with genes inferred from three 
separate miR-155 target analyses. Group II: the study in ref. 36 used RNA-seq to 
compare Mutu I B cells that overexpress miR-155 with cells transformed 
with a control vector**. Group III: ref. 25 identified shared targets between 
miR-155 and a viral orthologue, miR-K12-11. Group IV: the study in ref. 37 
used HITS-CLIP to identify miR-155 targets without perfect seed matches in 
T cells. e, GPCR determination of gene expression levels in Toledo cells 
treated for 48 h with 500 nM pHLIP-anti155 at pH 6.2; data are shown as 
mean ~ s.d.; 1 = 3; statistical analysis performed with two-tailed Student’s 
t-test, *P << 0.05. 
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Extended Data Figure 10 | Expression levels of putative targets in response compared with (white bars) untreated mice with lymphadenopathy; all samples 


to miR-155 silencing in mir-155'"'™ mice. qPCR validation of potential are normalized to B-actin; n = 3. Genes were selected on the basis of criteria 
miR-155 targets involved in tumour regression using mir-155'°"™ mice with described in Supplementary Table 3. As shown in Fig. 4f, both Bach] and Mafb 
conspicuous lymphadenopathy treated with (black bars) DOX for 16h have utility as biomarkers for miR-155 withdrawal-induced tumour regression. 
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Optogenetic control of organelle transport 


and positioning 


Petra van Bergeijk'*, Max Adrian'*, Casper C. Hoogenraad!' & Lukas C. Kapitein! 


Proper positioning of organelles by cytoskeleton-based motor pro- 
teins underlies cellular events such as signalling, polarization and 
growth’ *. For many organelles, however, the precise connection 
between position and function has remained unclear, because strate- 
gies to control intracellular organelle positioning with spatiotemporal 
precision are lacking. Here we establish optical control of intracel- 
lular transport by using light-sensitive heterodimerization to recruit 
specific cytoskeletal motor proteins (kinesin, dynein or myosin) to 
selected cargoes. We demonstrate that the motility of peroxisomes, 
recycling endosomes and mitochondria can be locally and repeat- 
edly induced or stopped, allowing rapid organelle repositioning. We 
applied this approach in primary rat hippocampal neurons to test 
how local positioning of recycling endosomes contributes to axon 
outgrowth and found that dynein-driven removal of endosomes from 
axonal growth cones reversibly suppressed axon growth, whereas 
kinesin-driven endosome enrichment enhanced growth. Our strat- 
egy for optogenetic control of organelle positioning will be widely 
applicable to explore site-specific organelle functions in different 
model systems. 

Eukaryotic cells use cytoskeletal motor proteins to control the trans- 
port and positioning of proteins, RNAs and organelles’. In neurons, 
mitochondria positioning contributes to synapse functioning and axon 
branching”’®, whereas positioning of Golgi outposts is thought to con- 
trol dendrite development’. Likewise, specific positioning of endosomes 
has been proposed to contribute to polarization and local outgrowth, 
either through selective delivery of building blocks or through localized 
signalling’*". In many cases, however, directly resolving the role of 
specific organelle positioning has remained challenging. Disruption of 
cytoskeletal elements and inhibition of motor proteins or adaptor mole- 
cules have been frequently used to alter organelle positioning, but these 
approaches often lack target selectivity as well as spatial specificity. There- 
fore, a tool to modulate locally the distribution of specific organelles 
with spatiotemporal accuracy is required. 

Using light-induced heterodimerization to recruit specific motors 
to selected cargoes might enable spatiotemporal control of intracellular 
transport, but whether such light-induced interactions can withstand 
motor-induced forces has remained unclear!*"*. To test this, we first used 
light-induced binding to couple microtubule-based motors to peroxi- 
somes in monkey COS-7 cells, because these vesicular organelles are 
largely immobile in the perinuclear region and any movement induced 
by light-targeted motor proteins could easily be observed’’. Peroxisomes 
were labelled using PEX-LOV, a fusion between the peroxisomal tar- 
geting signal of PEX3 and a photosensitive LOV domain from Avena 
sativa phototropin 1, which cages a small peptide that binds the engi- 
neered PDZ domain ePDZb1 after exposure to blue light"* (Fig. 1a, b). 
In addition, ePDZb1 was fused to the plus-end-directed kinesin-3 KIF1A 
to create KIF-PDZ. After co-expression of these two constructs and 
illumination with blue light, we observed the rapid redistribution of 
peroxisomes from the centre to the periphery of the cell where most 
microtubule plus-ends are located (Fig. 1c, d). Similarly, light-induced 
recruitment of minus-end-directed dynein using the amino terminus of 


BICD2 (BICDN) fused to ePDZb1 (BICDN-PDZ) triggered the accu- 
mulation of peroxisomes at the centre of the cells (Extended Data Fig. 
la-c and Supplementary Video 1). Importantly, peroxisome redistri- 
bution did not alter the spatial organization of mitochondria, the endo- 
plasmic reticulum, or the actin and microtubule cytoskeleton (Extended 
Data Fig. 2a, b). 

To quantify peroxisome motility, we first used image correlation anal- 
ysis to measure the overall frame-to-frame similarity before and after 
exposure to blue light’®. In the absence of transport, two subsequent 
images are largely identical and the correlation index will be close to 1, 
whereas a value of 0 indicates that all organelles have moved to previ- 
ously unoccupied positions. After light-induced recruitment of KIF1A, 
the correlation index rapidly decreased from 0.97 + 0.01 (mean + s.e.m.) 
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Figure 1 | Local and reversible activation of microtubule-based transport 
with light. a, b, Assay and constructs. MD, motor domain; NC, neck coil. 

c, Peroxisome distribution before and after light-induced recruitment of 
KIF-PDZ. d, Colour-coded overlay of time series. e, Displacement (black, 
expressed in Rog») and correlation (frame-to-frame similarity from 0 to 1, 
red) versus time (n = 6 cells, mean + s.e.m.). Blue marks illumination. 

f, g, Reversible activation using pulsed light. g, Maximum intensity projections 
during periods of 40s. See Supplementary Video 2. Time shown in 
minutes:seconds. h, Displacement (black, Roo) and correlation (red) versus 
time. i-I, Local activation using sequential illumination of four regions (i), 
resulting in outward targeting to adjacent regions (j, showing example 
trajectories), quantified using normalized fluorescence intensity (k, 1, coloured 
boxes mark blue-light illumination). See Supplementary Video 3. Scale bars, 
10 um. 


1Cell Biology, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, The Netherlands. 


*These authors contributed equally to this work. 


5 FEBRUARY 2015 | VOL 518 | NATURE | 111 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


to 0.76 + 0.04, reflecting the induction of continuous peroxisome motility 
(Fig. le). By contrast, dynein recruitment eventually increased the cor- 
relation index, because most peroxisomes accumulated at the same 
position in the centre of the cell (Extended Data Fig. 1c, d). To quantify 
this overall peroxisome repositioning, we calculated for each time point 
the radius of the circle required to enclose 90% of the fluorescence 
intensity of the peroxisomes, Rooy, and found a large increase from 
14 +2 um to 29 + 3 um on recruitment of KIF1A (Fig. le). By contrast, 
Rooy, decreased from 15.4 + 0.3 um to 12.8 + 0.6 um on recruitment of 
dynein (Extended Data Fig. 1d). Thus, rapid organelle redistribution 
can be induced by using light to recruit microtubule motors. 

To achieve spatiotemporal control, recruitment of motors should be 
both reversible and locally restricted. To test the reversibility of motility 
induction, we exposed cells expressing KIF-PDZ and PEX-LOV to three 
consecutive periods of blue light, interspersed with ~7 min without blue- 
light exposure. Whereas peroxisomes moved outwards during blue-light 
illumination, movement was arrested within seconds without blue light 
(Fig. 1f, g and Supplementary Video 2). By contrast, Rooy, remained 
stable without stimulation (Fig. 1h), indicating that peroxisomes do not 
spontaneously return to their original position after motor dissociation 
(Extended Data Fig. 3a). To test whether transport could be induced 
locally, we sequentially illuminated four different regions within a cell 
(Fig. 1i, j). Peroxisomes in the activated region rapidly redistributed to 
non-exposed areas, whereas non-exposed peroxisomes remained sta- 
tionary (Fig. 1j and Supplementary Video 3). The fluorescence inten- 
sity in the illuminated boxes 1-4 decreased by 60-75%, coinciding with 
a 180-280% increase in the adjacent peripheral boxes A-D (Fig. 1j-l). 
These results demonstrate that transport of intracellular cargo can be 
induced with spatiotemporal precision. 

We have previously shown that myosin-V can oppose kinesin-driven 
transport in actin-dense regions"®, suggesting that light-induced recruit- 
ment of myosin-V can be used to anchor organelles at specific sites. To 
test this, myosin-Vb was recruited to peroxisomes preloaded with the 
kinesin-2 KIF17 (refs 15, 16) (Fig. 2a, b). Whereas the attached kine- 
sin motor ensured continuous motility of many peroxisomes near the 


Peroxisome 


cell periphery (Fig. 2c), this motility was arrested after recruitment of 
myosin- Vb, resulting in a 30% increase of the correlation index (Fig. 2d). 
Local illumination increased the correlation index to similar levels, but 
only in the exposed region (Fig. 2c, d). Moreover, individual peroxisome 
trajectories showed on average four times smaller frame-to-frame dis- 
placements during illumination compared to before and after stimula- 
tion (Fig. 2e, f). These data demonstrate that organelle motility can be 
stalled with spatiotemporal precision through light-induced recruit- 
ment of myosin-Vb. 

We next used RABI 1-positive recycling endosomes to test our method 
on organelles whose proper physiological functioning depends on selec- 
tive transport and positioning. Kinesin- and dynein-based redistribu- 
tion and myosin-Vb-based anchoring of recycling endosomes could 
be transiently induced with light (Extended Data Figs 2-4, see also Sup- 
plementary Information, Extended Data Figs 5 and 6 and Supplemen- 
tary Videos 4 and 5), demonstrating that the movement of intrinsically 
dynamic cargoes can be temporarily amplified or overruled by coupling 
these cargoes to a specific motor using light. Notably, whereas peroxisomes 
remained largely immobile at their new location after light-dependent 
repositioning, the original distribution of recycling endosomes was quickly 
restored after the light-induced kinesin recruitment was stopped (Ex- 
tended Data Fig. 3). 

To test our approach in a more complex and delicate model system, 
we switched to primary cultures of rat hippocampal neurons. Their polar- 
ized morphology and specialized cytoskeletal organization in different 
compartments, such as axons, dendrites and dendritic spines, should 
allow transporting cargoes into and out of these compartments by recruit- 
ing the appropriate motor proteins. We first examined whether light- 
induced recruitment of myosin-Vb was sufficient to drive transport 
into dendritic spines, as proposed previously’’ *°. Indeed, in cells co- 
expressing PEX-LOV and a fusion of myosin- Vb with ePDZb1 (MYO- 
PDZ), 62 + 3% of the illuminated spines were targeted with peroxisomes 
compared to 1 + 1% of spines in non-illuminated dendrites (Fig. 2g-j 
and Supplementary Video 6). After the illumination period, the num- 
ber of peroxisome entries decreased with a half-time of ~36 s (Fig. 2)). 
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Figure 2 | Light-induced myosin-Vb recruitment anchors organelles or 
targets them into dendritic spines. a, b, Assay and constructs. CC, coiled coil. 
c, Peroxisome distribution in cell expressing PEX-LOV, KIF-PEX and 
MYO-PDZ. d, Correlation time trace for areas shown in c. e, Peroxisome 
trajectories with 70-s episodes before, during and after myosin-Vb recruitment. 
f, Frame-to-frame displacements of peroxisomes (5 interval). Red denotes 
the average of nine individual peroxisome trajectories. g, Peroxisome 
distribution in primary hippocampal neuron expressing PEX-LOV and MYO- 
PDZ. Dashed red rectangle was illuminated. h, Thirty-second maximum 
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projections of regions from g. Arrowheads mark peroxisomes in spines. See 
Supplementary Video 6. i, Spine targeting in control (n = 12) and illuminated 
(n = 17) dendrites, in three independent experiments. Mean + s.e.m., 

***P < 0.0001, Mann-Whitney test. j, Spine entries over time. Mean ~ s.e.m., 
*P < 0.05, ***P < 0.0001, Kruskal-Wallis analysis of variance (ANOVA), 
Dunn’s post-hoc test, n = 17 dendrites. Inset: entry probability after 
illumination (red) fitted with exponential decay exp(t 2) (black, 2 = 36.36 s). 
Scale bars, 5 um (g, h) and 10 jm (c). 
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Similarly, RAB11 recycling endosomes could be enriched in specific 
spines by local illumination (Extended Data Fig. 40-q), demonstrating 
that light-controlled transport can be used to manipulate individual 
dendritic spines. 

RAB11 vesicles have been implicated in the control of axon growth, but 
their local role in the growth cone could not be assessed previously’*?7". 
Wetherefore used local light-induced recruitment of motor proteins to 
RABI1 recycling endosomes to test how local dynein-driven removal 
or kinesin-driven addition of endosomes affects growth cone dynamics 
(Fig. 3a, d). Importantly, neither illumination nor addition of the het- 
erodimerizer rapalog (used to link the LOV domain to the N terminus 
of RAB11) altered growth cone structure or behaviour in cells expres- 
sing FRB-LOV and PDZ only (Extended Data Figs 5 and 7). Likewise, 
in control neurons expressing FKBP-RAB11 together with BICDN- 
PDZ or KIF-PDZ, but lacking the FRB-LOV protein, exposure to blue 
light did not affect the rapid filopodial and lamellipodial dynamics or 
the overall growth of most growth cones. (Fig. 3b, e). When dynein was 
coupled to RAB11, a clear decrease in growth cone dynamics and growth 
was observed (Fig. 3c, g, Extended Data Fig. 8 and Supplementary Video 7). 
By contrast, coupling of kinesin resulted in rapid axon extension in 
39 + 7% of the growth cones (Fig. 3f, h and Supplementary Video 7). 
Importantly, when growth cones were not completely collapsed upon 
dynein-dependent RAB11 depletion, this depletion and the reduced 
growth cone dynamics could be reversed when cells were no longer ex- 
posed to blue light (Fig. 3i, j). These data demonstrate that growth cone 
dynamics and axon growth directly depend on RAB11 vesicle func- 
tioning near the growth cone, rather than on general RAB11 functions 
elsewhere in the cell. 

Recently, the controlled anchoring and mobilization of mitochondria 
have emerged as key regulatory events in neurons****. Mitochon- 
drial positioning depends on both motor-dependent transport and con- 
trolled immobilization by specific docking factors, but the molecular and 
mechanical interplay between motors and docking factors has remained 
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Figure 3 | Motor-based redistribution of recycling endosomes modulates 
axon outgrowth. a, d, Assay and constructs. Rapalog targets FRB-LOV to the 
RAB11 N terminus. b, ¢, e, f, Growth cone dynamics of neurons expressing 
FKBP-RAB11 and BICDN-PDZ (b, c) or KIF—PDZ (e, f) without (b, e) or 
with (c, f) FRB-LOV. See Supplementary Video 7. g, h, Light-induced 
reduction of growth cone (GC) dynamics (g) or light-induced growth 
enhancement (h). Dynein/kinesin: -FRB-LOV: n = 21/19 axons, +FRB- 
LOV: n = 25/35 axons, in n = 5/5 independent experiments. Mean + s.e.m., 
*P < 0.05, ***P < 0.0001, unpaired two-tailed t-test. i, Same growth cone in 
low and high contrast illustrating reversibility of reduced FKBP-RAB11 
targeting and growth cone dynamics. j, Area over time for the example 
shown in i. Scale bars, 10 um. 
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unclear**”*. For example, syntaphilin (SNPH) has been proposed to 
induce anchoring by crosslinking mitochondria to microtubules and 
through a direct inactivating interaction with kinesin*”*, but whether 
remobilization requires the regulated release of both interactions is not 
known”. To test whether recruitment of more motors can overcome 
anchoring, we used light to recruit KIF-PDZ to axonal mitochondria 
labelled with TOM-LOV (Fig. 4a, b) and found that this was sufficient 
to mobilize most mitochondria in the illuminated region (Fig. 4c-g and 
Supplementary Video 8). Conversely, light-induced recruitment of the 
N-terminal part of SNPH was sufficient to acutely anchor motile mito- 
chondria, independent of their directionality (Fig. 4h—-j and Supplemen- 
tary Video 9). These results demonstrate that regulation of mitochondrial 
motility and anchoring does not require all-or-nothing switching between 
the activation and inactivation of specific motors, but instead depends 
on the balance of forces between active motors and passive anchors. 
We have established optically-controlled intracellular transport by 
using light-sensitive heterodimerization to recruit specific cytoskeletal 
motor proteins to selected cargoes. Our ability to control organelle posi- 
tioning complements recent work that established optogenetic control 
over nucleocytoplasmic distribution of proteins”. We anticipate that 
this approach will be widely applicable to study how organelle posi- 
tioning controls cellular functioning, as demonstrated here for the role 
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Figure 4 | Altering mitochondrial dynamics through recruitment of motors 
and anchors. a, b, Assay and constructs. c, Axonal mitochondria before and 
during KIF—-PDZ recruitment. Arrowheads track individual mitochondria. 
See Supplementary Video 8. d, Kymograph for axon shown in ¢, representative 
for n = 6 axons. Blue box marks activation (1 min:15 s). e, Correlation over time 
for region shown in c. f, Axonal mitochondria before and during local 
illumination (blue box). See Supplementary Video 8. g, Relative fluorescence 
intensity versus time in the illuminated (blue box in f) and the adjacent, distal 
region (red box in f). h, Axonal mitochondria before and during SNPH 
recruitment. Arrowheads track individual mitochondria. See Supplementary 
Video 9. i, Kymograph for axon shown in h, representative for n = 5 axons. 
Blue boxes mark activation (4 min:50 s). j, Correlation versus time for region in 
Supplementary Video 9. Scale bars, 5 tum (c, f, h) and 10 um (4d, i). 
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of recycling endosomes in growth cone dynamics. In addition, it could 
be used to control cellular processes such as polarization, signalling 
and outgrowth by depleting or accumulating cargo at specific sites. For 
example, increased axonal targeting of certain cargoes might promote 
axon regeneration after injury and provide novel insights into the mech- 
anisms contributing to regeneration failure or success, both in culture 
and in different animal models”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

DNA constructs. The following constructs have been described: tagRFPt” (gift from 
R. Tsien), pCIBN(deltaNLS)-pmGFP and pCry2PHR-mCherryN] (ref. 13; addgene, 
plasmids 26867 and 26866), mid(SS/TM)-GFP-LOVpep and ePDZb1-mCherry"* 
(addgene, plasmids 34972 and 34981), TOM20-mCherry-GAI” (gift from T. Inoue), 
HA-rab]1 1a (ref. 31), Kifla(1-489)-GFP-FRB, Kif5b(1-807)-GFP-FRB, MyoVb(1- 
1090)-GFP-FRB, HA-BicD2(1-594)-FRB and Pex3(1-42)-mRFP"*, pGW2-pex26 
and pGW2-Kifl17-GFP-pex26 (ref. 16), GFP-SNPH” (gift from Z. Sheng), GFP- 
RCP* (RAB11FIP1, gift from R. Prekeris), NPY-GFP** and GFP-MACF18 (ref. 34). 
Cloning vectors and fluorescent tags. The constructs used in this study were cloned 
in the mammalian expression vectors pGW1-CMV, pGW2-CMV and/or pBactin’’. 
pBactin-GFP, pBactin-tagRFPt, pBactin-tagBFP and pBactin-iRFP were generated 
by ligating the fluorescent proteins in the SalI and Spel site of pBactin. 

Tagging motor proteins, adaptors and anchors with CIBN and ePDZb1. To 
generate myoVb(1-1090)-GFP-CIBN and myoVb(1-1090)-GFP-ePDZb1, amino 
acids 1-1090 of myosin-Vb were cloned in the AscI and EcoRI sites of pBactin-GFP, 
and either CIBN or ePDZb1 was inserted downstream of GFP using a PCR-based 
strategy. Similarly, myoVb-iRFP-CIBN was made using the pBactin-iRFP vector 
backbone. Kifla(1-383)-GFP-CIBN and Kifla(1-383)-GFP-ePDZb1 were gener- 
ated by ligating amino acids 1-383 of mouse KIF1A in the Ascl and Sall sites of 
pactin-GFP. Subsequently, PCR-amplified CIBN or ePDZb1 was inserted down- 
stream of GFP. Haemagglutinin (HA)-tagged HA-BicD2(1-500)-CIBN and BicD2 
(1-500)-ePDZb1 were cloned by inserting PCR-amplified BicD2(1-500) (refered to 
as BICDN in the main text) into the pBactin vector backbone. Subsequently, CIBN 
and ePDZb1were ligated downstream of BicDN. Kif5b(1-807)-GFP-ePDZb1 was 
made by inserting PCR-amplified Kif5b(1-807) into the AscI and BamHI sites ofa 
GFP-ePDZb1 backbone. To create SNPH(45-748)-GFP-ePDZb1, PCR-amplified 
SNPH (forward primer: 5'-AGCGCTAAGCTTGCCACCATGGCCATGTCCCT 
GCAGGGAAG-3’ and reverse primer: 5’-GCCCTTGCTCACCATAGTCGACC 
CCACTACCACAGCCAGCAGATCCAC-3’) was inserted into a GFP-ePDZb1 
backbone using Cold Fusion cloning (System Biosciences). 

Tagging peroxisomes, RAB11 vesicles and mitochondria with LOVpep and 
Cry2PHR. To generate Pex3-mRFP-LOVpep (PEX-LOV), LOVpep, including a 
9-amino-acid linker (GGSGGSGGS), was ligated in the Ascl and Sall sites of pGW1- 
pex3-mRFP. To make TOM20-mCherry-LOVpep, Pex3-mRFP was replaced by 
TOM20(1-34)-mCherry using the HindIII and Ascl sites. To create Cry2PHR- 
tagRFPt-Rab1 1 and FKBP-tagRFPt-Rab11, Rab] 1a was introduced in the Spel and 
NotI sites of pBactin-tagRFPt. Subsequently, PCR-amplified FKBP or Cry2PHR 
was ligated upstream of tagRFPt. FRB-tagBFP-LOVpep was made by inserting 
LOVpep, including a 9-amino-acid linker, in the Spel and Not! sites of pBactin- 
tagBFP. Subsequently, PCR -amplified FRB was cloned upstream of tagBFP. 
Other constructs. pJPA5-TfR-GFP (a gift from G. Banker) was cloned into a B-actin 
vector. Membrane targeting of GFP was achieved by inserting the 40 most N-terminal 
residues of the MARCKS protein with an additional palmitoylation site at residue 3 
(ref. 35) into GW2-tagRFPt. To generate mRFP-actin, human cytoplasmic B-actin 
was cloned from pEGFP-actin (Clontech) in the B-actin-mRFP vector. 

Cell cultures and transfection. COS-7 cells were cultured in DMEM/Ham’s F10 
(1:1) medium containing 10% FCS and penicillin/streptomycin. Then, 2-4 days 
before transfection, cells were plated on 24-mm diameter coverslips. Cells were 
transfected with Fugene6 transfection reagent (Roche) according to the manu- 
facturer’s protocol and imaged one day after transfection. 

Primary hippocampal cultures were prepared from embryonic day 18 (E18) rat 
brains**. Cells were plated on coverslips coated with poly-L-lysine (30 mg ml’) 
and laminin (2 mg ml ') ata density of 75,000 per well. Hippocampal cultures were 
grown in Neurobasal medium (NB) supplemented with B27, 0.5 mM glutamine, 
12.5 mM glutamate, and penicillin plus streptomycin. Hippocampal neurons were 
transfected 48 h before imaging with lipofectamine 2000 (Invitrogen). DNA (3.6 11g 
per well) was mixed with 6.6 ul lipofectamine 2000 in 400 ml NB, incubated for 
30 min, and then added to the neurons in NB supplemented with 0.5 mM glutamine 
at 37 °C in 5% CO, for 60 min. Next, neurons were washed with NB and transferred 
to the original medium at 37 °C in 5% CO, for 2 days. Transport assays targeting 
dendritic spines were imaged at day-in-vitro (DIV) 20-22 and growth cone or 
mitochondria assays were imaged at DIV 3-7. 

Live-cell image acquisition. Time-lapse live-cell imaging of COS-7 cells and hip- 
pocampal neurons was performed on a Nikon Eclipse TE2000E (Nikon) equipped 
with an incubation chamber (Tokai Hit; INUG2-ZILCS-H2) mounted on a motor- 
ized stage (Prior)'*®. Coverslips (24 mm) were mounted in metal rings, immersed in 
0.6 ml Ringer’s solution (10 mM HEPES, 155 mM NaCl, 5 mM KCl, 1 mM CaCl, 
1mM MgCl, 2 mM NaH>PO, and 10 mM glucose, pH 7.4) or full medium (RAB11 
imaging in COS-7 cells) or conditioned medium (Neuron imaging), and maintained 
at 37 °C and 5% COs. Cells were imaged every 5, 10 or 30s for 5-50 min using a 
40X objective (Plan Fluor, numerical aperture (NA) 1.3, Nikon) and a Coolsnap 
HQ2 CCD camera (Photometrics). Dense-core vesicles were imaged using a 100 
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objective (Apo TIRF, 1.49 NA, Nikon) on a Evolve 512 EMCCD camera (Photo- 
metrics). A mercury lamp (Osram) and filter wheel containing ET-GFP (49002), 
ET-dsRed (49005), ET-mCherry (49008) and ET-GFPmCherry (59022) emission 
filters (all Chroma) were used for excitation and for global activation. For global 
activation, the GFP excitation filter was used to illuminate the sample for 100- 
150 ms with every image acquisition during the periods of blue-light exposure. In 
most experiments, the activation intensity was around 10 W cm ? and the total irra- 
diance was about 30 times higher than the minimum irradiance required for full 
activation (see Extended Data Fig. 4h). These settings allowed us to monitor the 
dynamics of GFP-labelled proteins or growth cones during activation. 

For local illumination of specific areas using a 488-nm laser line, a FRAP scan- 
ning head was used (FRAP L5 D-CURIE, Curie Institute). Compared to standard 
FRAP experiments the laser was used at much lower intensities. 

Live-cell spinning disk confocal microscopy of growth cones and spines of hip- 
pocampal neurons was performed on a Nikon Eclipse-Ti (Nikon) microscope with 
a Plan Apo VC, 60, 1.40 NA oil objective (Nikon). The microscope is equipped 
with a motorized stage (ASI; PZ-2000), a Perfect Focus System (Nikon), an incuba- 
tion chamber (Tokai Hit; INUG2-ZILCS-H2) and uses MetaMorph 7.7.11 software 
(Molecular Devices) to control the camera and all motorized parts. Confocal 
excitation and detection is achieved using 100mW Vortran Stradus 405 nm, 
100 mW Cobolt Calypso 491nm and 100mW Cobolt Jive 561nm lasers and a 
Yokogawa spinning disk confocal scanning unit (CSU-X1-A1N-E; Roper Scientific) 
equipped with a triple-band dichroic mirror (z405/488/568trans-pc; Chroma) anda 
filter wheel (CSUX1-FW-06P-01; Roper Scientific) containing 4’ ,6-diamidino-2- 
phenylindole (DAPI; ET-DAPI (49000)), GFP (ET-GFP (49002)) and mCherry 
(ET-mCherry (49008)) emission filter (all Chroma). Confocal images were acquired 
with a QuantEM:512 SC EMCCD camera (Photometrics) at a final magnification 
of 110 nm per pixel, including the additional 2.5 X magnification introduced by an 
additional lens mounted between scanning unit and camera (VM Lens C-2.5X; 
Nikon). Local activation of photo-heterodimerization was achieved with an ILas 
FRAP system (Roper Scientific France/ PICT-IBiSA, Institut Curie) and 491 nm 
laser line at low power. To couple FRB-LOV to FKBP-RAB11, rapalog (AP21967, 
ARIAD) was dissolved to 0.1 mM in ethanol. Then 20 min before imaging, 0.2 ml 
of culture medium with rapalog (400 nM) was added to establish a final rapalog 
concentration of 100 nM. 

Image processing and analysis. Images of live cells were processed and analysed 
using MetaMorph (Molecular Devices), LabVIEW (National Instruments) soft- 
ware and ImageJ (NIH). If not followed by a quantification in a subsequent panel, 
representative images are representative of 60-90% of the cells studied in the same 
conditions, with at least five responding cells per condition (except for Extended 
Data Fig. 4f with three responding cells, because we used the system in Extended 
Data Fig. 4i, j for all subsequent experiments). The exact organelle distributions and 
dynamics mostly depended on the levels of protein overexpression, which could 
not be examined before the experiment without triggering heterodimerization. For 
example, if the motor were poorly expressed, less redistribution was observed. This 
was most apparent in experiments where three or more constructs were co-expressed, 
some of which without fluorescent marker that could be used to confirm expres- 
sion of the motor. 

Quantification of redistribution dynamics. Before analysis, cells were masked to 
exclude contributions from neighbouring cells to the analysis. For the colour-coded 
redistribution plots, all images of a time-lapse recording were thresholded at ~5- 
20 times the standard deviation of the background above the background to yield 
binary images that were subsequently overlaid non-transparently starting with the 
final frame (first frame on top) in Fig. 1d, and starting with the first frame (last frame 
on top) in Fig. 3c. Each frame was coloured using a time-coded gradient that ran 
from orange to white before and from white to green after blue-light illumination. 
To quantify the radial redistribution of peroxisomes upon recruitment of (addi- 
tional) motor proteins, the radius required to include 90% of the total intensity of 
the cell, Roo(t), was calculated for each frame as described previously”’. 

To quantify changes in the dynamics of peroxisomes or RAB11 vesicles upon 
recruitment of (additional) motor proteins, we calculated the time-dependent frame- 
to-frame correlation index c,(t)'* by first calculating the integrated intensity of 
thes image obtained by multiplying the frames acquired at t and t + At, that is, 
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frame acquired at time t. These values can then be normalized using either the 
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(x,y,t + At), in which i(x, y, f) is the intensity at pixel (x, y) of the 
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c,(t) will be 1 ifthe particles are completely anchored and their positions unchanged 
after a time t, whereas c,(f) will be 0 if all particles moved to previously unoccupied 
locations. In practice, c,(¢) will remain finite even in very dynamic samples, because 
a subset of particles will move to locations that were occupied by different particles 
in the first image. In all our analyses, we used frame-to-frame differences. For ana- 
lysing the correlation index in small regions (Figs 2d and Extended Data Fig. 41), 
measurements were averaged over six adjacent time points. 

To determine local changes in fluorescence intensities over time (Fig. 1i-I), the 
mean grey value of the first-frame-subtracted recording (Fig. 11), or last-frame- 
subtracted recording (Fig. 1k) was measured, and the maximum was set to 1. Indi- 
vidual peroxisomes or RAB11 vesicle trajectories were obtained using the MTrackJ 
plugin in ImageJ”. 

Analysis of spine entries. Peroxisomes and RAB11 vesicles were imaged at 1-s 
intervals, preferably in two dendrites of the same neuron, of which one was illumi- 
nated with pulses of 491 nm light directly before the frames indicated. Spine entries 
during periods of 100 s before, 100 s during and 200 s after illumination were detected 
manually using the Cell Counter plugin in Image] to determine the fraction of cargo- 
targeted spines and the frame of spine entry. The mean percentage of spines targeted 
with peroxisomes was compared with a Mann-Whitney test (Fig. 2i), and the mean 
percentages of peroxisome and endosome entries were subjected to a Kruskal-Wallis 
ANOVA with Dunn’s post-hoc test (Fig. 2j) or a one-way ANOVA with Bonferroni’s 
post-hoc test (Extended Data Fig. 4p), respectively. The half-time of peroxisome 
entries into dendritic spines after illumination was estimated by fitting a single 
exponential function (R? = 0.9942) through the inverted cumulative histogram of 
the observed entry events after 491 nm illumination was stopped. 

Analysis of axon growth and growth cone area. Axonal growth was manually 
tracked using the MTrackJ plugin in Image]’’. The percentages of growth cones 
exhibiting light-induced reduction in dynamics or growth enhancements were com- 
pared using unpaired two-tailed t-tests (Fig. 3g, h). In all our experiments, only the 
REP channel was available for imaging without triggering photo-heterodimerization 
before, during and after exposure to blue light. We used this channel to image FKBP- 
tagRFPt-RAB11 to verify that light-controlled recruitment of BICDN induced the 
removal of RAB11 endosomes (Extended Data Fig. 8a, b). FKBP-RAB11 was enriched 
at vesicles-like structures, whose dynamics altered upon light-dependent recruit- 
ment of BICDN to recruit dynein. In addition, FKBP-RAB11 diffusely filled the 
axon, which could be used to determine axon morphology and size with precision 
comparable to a cytoplasmic GFP fill (Extended Data Fig. 8c). We counted the 
positive pixels in a binarized image obtained by thresholding the median-filtered 
tagRFPt image, followed by two erosions and closure**. Because tag-RFPt fluor- 
escence of this construct showed a threefold increase in intensity upon 491 nm 
excitation (see Extended Data Fig. 8a), we established a dynamic threshold T that 
scaled with the maximum intensity of the object, that is, T = Ing + Gpg + 0.02(Inax)s 
in which Ip, and Gyg are the average and standard deviation of the intensity in an 
area outside the axon, respectively, and I,,ax is the average of the top 2% intensity 
values above I, bg + pg. Using these parameters, changes in area are independent of 
the changes in intensity upon exposure to blue light or due to dynein-mediated 
removal of RAB11 vesicles (see Extended Data Fig. 8d, e). 

Relative decreases in growth cone RAB11-FKBP signal were calculated by rescal- 
ing all intensity values normalized initially to t_5.30 min to the average intensity value 
of -FRB-LOV control growth cones at tg min (see Extended Data Fig. 8a). To calculate 
changes in growth cone area before blue-light illumination, we compared single 
growth cone area values averaged over three frames at to min aNd t_4 min relative to 
illumination onset (See Extended Data Fig. 8g, h). Analogously, comparing values 
at tg min aNd to min Shows net growth during blue-light illumination (see Extended 
Data Fig. 8i, j). All of these results were compared using Mann-Whitney tests (Ex- 
tended Data Fig. 8b, g, i). All statistical testing was performed in GraphPad Prism 
5 software. No statistical method was used to predetermine sample size. 
Immunofluorescence cell staining, imaging and antibodies. COS-7 cells (1 day 
after transfection) or primary hippocampal neurons (2 days after transfection) were 


either kept in the dark or illuminated for 10 min using a blue light-emitting diode 
mounted in the incubator. Afterwards, cells were fixed at room temperature for 
10 min with 4% paraformaldehyde (PFA), 4% sucrose. For detection of EB1, cells 
were fixed for 5 min at —20 °C in 100% ice-cold methanol supplemented with 1 mM 
EGTA, followed by 5 min post-fixation at room temperature in 4% PFA, 4% sucrose. 
After fixation, cells were washed three times in PBS and incubated overnight at 
4°C in GDB buffer (0.1% BSA, 450 mM NaCl, 0.3% Triton X-100 and 16.7mM 
phosphate buffer, pH 7.4) containing the primary antibody. The next day, cells were 
washed three times for 10 min with PBS, followed by a 1-h incubation at room tem- 
perature with the secondary antibody in GDB buffer. After washing cells three 
times for 10 min in PBS, slides were mounted in Vectashield mounting medium 
(Vector Laboratories). Images were taken with a Nikon eclipse 80i upright fluor- 
escence microscope and a Coolsnap HQ2 CCD camera (Photometrics), using a 
40 oil objective (Plan Fluor, NA 1.3), 60% oil objective (Plan Apo VC, NA 1.4) or 
100X oil objective (Plan Apo VC, NA 1.4). 

Antibodies and reagents used: mouse anti-Cytochrome c (6H2.B4, 556432, BD 

Biosciences), mouse anti-PDI (RL90, MA3-019, Affinity BioReagents), phalloidin- 
Alexa647 (A22287, Invitrogen), mouse anti-alpha tubulin (B-5-1-2, T-5168, Sigma), 
mouse anti-EB1 (610535, BD Transduction), mouse anti-Lamp1 (this antibody 
developed by J. T. August and J .E. K. Hildreth, was obtained from the hybridoma 
bank, created by the NICHD of the NIH and maintained at The University of Iowa, 
Department of Biology), mouse anti-EEA1 (BD Transduction), rabbit anti-RAB11 
(71-5300, Invitrogen), rabbit anti-Homer-1 (160-002, SySy), Alexa 488-, Alexa 568-, 
Alexa 647-conjugated secondary antibodies (Invitrogen). 
GFP pull-down and western blotting. HEK cells were cultured in DMEM/Ham’s 
F10 (1:1) medium containing 10% FCS and penicillin/streptomycin. Then 1 day 
after plating, HEK293T cells were transfected using polyethylenimine (PEI; Poly- 
sciences). After 24 h, GFP beads (GFP-Trap_M, Chromotek) were washed in wash- 
ing buffer (TBS, 0.5% Triton X-100 and protease inhibitor) and incubated for 1 h 
in blocking buffer (TBS, 0.5% Triton X-100, 2% glycerol, 2% chicken egg white). 
Meanwhile, cells were collected in ice-cold TBS, pelleted and lysed in extraction 
buffer (TBS 0.5% Triton X-100, protease inhibitor, phosphatase inhibitor (Roche), 
100 uM GTPYS, 5 mM MgCh, pH 8.0). Cell lysates were centrifuged for 15 min at 
4 °C at12,000g, followed by a 1.5-h incubation of the supernatants with the washed 
GFP beads, while rotating at 4 °C. Beads were collected using a magnetic separator 
and washed four times. Samples were eluted in SDS sample buffer, boiled for 5 min 
and loaded onto SDS-PAGE gels and blotted on PVDF membranes (Millipore). 
Blots were blocked in 5% milk in PBST (0.1% Tween in PBS) and incubated over- 
night at 4 °C (primary antibody) or for 1h at room temperature (secondary anti- 
body conjugated to horseradish peroxidase) in PBST. Finally, blots were developed 
using enhanced chemiluminescent western blotting substrate (Pierce). 

Antibodies used: rabbit anti-TagRFPt (ab234, Evrogen), rabbit anti-GFP (ab290, 
abcam) and anti-rabbit IgG antibody conjugated to horseradish peroxidase (DAKO). 
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Extended Data Figure 1 | Optical control of dynein-based cargo motility. 
a, b, Assay and constructs. A fusion construct of PEX3, monomeric red 
fluorescent protein (mRFP) and LOVpep (PEX-LOV) targets peroxisomes. 
After blue-light illumination, a fusion of the N terminus of the dynein adaptor 
BICD2 and ePDZb1 (BICDN-PDZ,) is recruited to peroxisomes. c, Peroxisome 
distribution in a COS-7 cell expressing PEX-LOV and BICDN-PDZ before 
and during light-induced recruitment of dynein (inverted contrast). Red lines 
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indicate cell outline. Scale bar, 10 pm. See Supplementary Video 1. d, Black: 
time trace of Roo, (radius of circle enclosing 90% of cellular fluorescence; see 
Methods) in cells expressing PEX-LOV and BICDN-PDZ (n = 5 cells). Red: 
correlation index (frame-to-frame differences in the peroxisome recordings; 
see Methods) of the same cells. Blue-light illumination is indicated in blue; 
mean = s.e.m. 
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Extended Data Figure 2 | Light-induced organelle redistribution is PEX-LOV and KIF-PDZ, showing the distribution of peroxisomes and 
organelle-specific and does not affect the cytoskeleton. a, Images of fixed phalloidin, «-tubulin or EB1 staining in the absence or presence of blue light. 
cells expressing PEX-LOV and KIF-PDZ, showing the distribution of c, Images of fixed cells expressing FKBP-RAB11, FRB-LOV and KIF-PDZ, 


peroxisomes and mitochondria (anti-cytochrome-c), or peroxisomes and the _ showing the distribution of RAB11 recycling endosomes together with 
endoplasmic reticulum (anti-protein disulfide isomerase (PDI)) in the absence _ lysosomes (anti-Lamp1) or early endosomes (anti-EEA1) in the absence or 
(left) or presence (right) of blue light. b, Images of fixed cells expressing presence of blue light. Red lines indicate cell outline. Scale bars, 10 tm. 
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Extended Data Figure 3 | After light induced organelle displacement, at to.o9- b, Distribution of RAB11 recycling endosomes before and after 
peroxisomes remain at their newly obtained position whereas the exposure to blue light in cells expressing FKBP-RAB11, FRB-LOV and KIF- 
distribution of recycling endosomes quickly reverses back to normal. PDZ. Blue light was turned off at to.o9. Red lines indicate cell outline. Scale bars, 
a, Peroxisome distribution before and after exposure to blue light in cells 10 pm. See Supplementary Video 4. 


expressing PEX-LOV and KIF-PDZ. Blue-light illumination was terminated 
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Extended Data Figure 4 | Spatiotemporal control of recycling endosome 
distribution and dynamics. a, b, Assay and constructs. A fusion construct of 
Cry2PHR, tagRFPt and RAB11, (CRY-RAB11) targets RAB11 recycling 
endosomes. After blue-light illumination, a fusion of truncated KIF1A, GFP 
and CIBN (KIF-CIBN) or a fusion of truncated BICDN, GFP and CIBN 
(BICDN-CIBN) is recruited to RAB11 recycling endosomes. c, RAB11 vesicle 
distribution before and after light-induced recruitment of KIF1A (inverted 
contrast). Red lines indicate cell outline. Scale bar, 10 pm. d, Overlay of 
sequential binarized images from the recording in c, colour-coded by time as 
indicated. Orange marks the initial distribution of RAB11 vesicles, whereas 
green marks regions targeted after exposure to blue light. e, Time trace of the 
Reo and Rogy, (black) and the correlation index (red) of the cell shown in c and 
d. Blue box marks blue-light illumination. f, RAB11 distribution in a cell 
expressing CRY-RAB11 and BICDN-CIBN before and after blue-light 
illumination (inverted contrast). Red lines indicate cell outline. Scale bar, 

10 um. g, Time trace of the Regu, and Roos, (black) and correlation index for the 
cell shown in f. h, Irradiance response curve for cells transfected with CRY-PEX 
and KIF—CIBN (red), or PEX-LOV plus KIF-PDZ (black). To exclude 
activation failure due to poorly expressed motors, the number of cells reacting 
at each concentration was divided by the number of cells responding to 
subsequent high irradiance (~1.3 W cm”). Three biological replicates. Cells 
per intensity (for increasing intensities): 28, 21, 22, 20, 24, 22 and 20 for CRY, 
30, 28, 33, 31, 28, 33, 33, 32 and 26 for LOV. Error bars depict s.e.m.; three 
biological replicates. Solid line shows fit to R= 100. I” / (Q +I *, with R the 
response, I the illumination intensity, Ip the intensity at which the response is 
50%, and n the Hill coefficient. For CRY-PEX and PEX-LOV, Ip is 0.05 and 
0.12W cm 2, respectively. i, j, Assay and constructs. A fusion construct of 


LETTER 


FKBP, tagRFPt and RAB11 (FKBP-RAB11) targets RAB11 recycling 
endosomes. Rapalog addition couples FKBP to FRB, leading to recruitment of 
the FRB, tagBFP and LOVpep fusion protein (FRB-LOV). After blue-light 
illumination a fusion of truncated myosin-Vb, GFP and ePDZb1 (MYO-PDZ) 
is recruited to RAB11 vesicles. k, RAB11 distribution in a cell expressing 
FKBP-RAB11, FRB-LOV and MYO-PDZ before sequential blue-light 
illumination of the regions marked with numbered boxes (inverted contrast). 
Scale bar, 10 jm. See Supplementary Video 5. 1, Time traces of the correlation 
index in the areas shown in k. Blue box marks whole-cell exposure to blue 
light, whereas colored boxes indicate local illumination. m, Example 
trajectories of two RAB11 recycling endosomes before, during and after 
recruitment of myosin-Vb, as indicated. Data was acquired with 1-s intervals. 
For each period 40s are shown. n, Frame-to-frame displacements of RAB11 
recycling endosomes before, during and after light-induced recruitment of 
myosin-Vb (5 s interval). Thick lines show the average of five tracks in shades of 
grey. 0, FKBP-RAB11 distribution (inverted contrast) in a dendrite and 
dendritic spines before, during and after blue-light illumination. Images are 
maximum projections spanning 60 s. Red lines indicate cell outline, arrowheads 
mark spines targeted with recycling endosomes during blue-light illumination. 
Scale bar, 2 1m. p, Percentage of recycling endosome spine entry events per 
dendrite before, during and after illumination in bins of 100s. Blue box 
indicates blue-light illuminated interval, n = 16 dendrites in three independent 
experiments. Red bar denotes mean + s.e.m., ***P < 0.0001, one-way 
ANOVA, Bonferroni’s post-hoc test. q, Histogram of fraction of all (n = 237) 
recycling endosome spine entries in bins of 20s. Blue box indicates blue-light- 
illuminated interval. 
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Extended Data Figure 5 | Rapalog in the nanomolar range is sufficient to 
recruit FRB-LOV to FKBP-RAB11 and does not affect the number of 
spines or growth cones in hippocampal neurons. a, Response curve of RAB11 
recycling endosome relocalization in cells expressing FKBP-RAB11, FRB- 
LOV and KIF-PDZ exposed to blue light in relation to rapalog concentration. 
To exclude activation failure due to poorly expressed motors, the number of 
cells reacting at each concentration was divided by the number of cells 
responding to subsequent high rapalog concentration (1 11M). Solid line shows 
fit to R= (Rminlg +100. I") / (Ip +I"), with R the response, c the rapalog 
concentration, cp the concentration at which the response is 50%, n the Hill 
coefficient, and Rmin the response at 0 mM rapalog. Rynin is 22% and cy is 15 nM. 
n = 30 (0.1 nM), 37 (1 nM), 30 (10 nM), 28 (100 nM) and 28 (500 nM) 
responsive cells from three independent experiments. Error bars depict s.e.m. 
b, Hippocampal neurons transfected with membrane-GFP incubated for 
2.5h in the presence or absence of 100 nM rapalog, co-stained with the 
post-synaptic marker Homer. c, Quantification of the number of Homer 
puncta per 100 um dendrite length in the presence or absence of 100 nM 
rapalog (n = 13 neurons per condition). Error bars depict s.e.m. 

d, Hippocampal neurons transfected with GFP incubated for 2.5h in the 
presence or absence of 100 nM rapalog, co-stained with phalloidin. 

e, Quantification of the number of growth cones per 50,000 bum? in the presence 
or absence of 100 nM rapalog, co-stained with phalloidin. n = 19. Scale bars, 
5 um. Error bars depict s.e.m. 
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RABI11 antibody, partially co-localize with transferrin receptors and Red lines indicate cell outline. Scale bar, 2.5 um. c-e, GFP pull-down assays 
interact with RAB11FIP1. a, Images of untransfected cells or cells transfected — with lysates of HEK cells expressing GFP or GFP-RAB11FIP1 together with 
with CRY-RAB11, FKBP-RAB11 or tagRFPt-RAB11, co-stained with tagRFPt-RAB11 (c), FKBP-RAB11 or FKBP-tagRFPt-RAB6 (d) or CRY- 


anti-RAB11 antibody (inverted contrast). Red lines indicate cell outline. Scale | RAB11 (e) were analysed by western blotting using antibodies against tagRFPt 
bar, 2.5 tum. b, Images of cells transfected with TfR-GFP only, or co-transfected and GFP. 
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Extended Data Figure 7 | BICDN overexpression does not significantly 
inhibit dynein-based transport and the growth cone cytoskeleton is not 
affected by light-induced recruitment of BICDN to recycling endosomes. 
a, b, Left: kymograph of dense-core vesicles motility in an axon expressing 
neuropeptide Y (NPY) fused to GFP (NPY-GFP) and empty vector (a) or 
BICDN-PDZ (b) (inverted contrast), representative of n = 5 and n = 10 axons, 
respectively. Right: corresponding binary image of traces used for further 
analysis of anterograde and retrograde movements. Scale bars, 5 jm and 10s. 
c, Position of dense-core vesicles along an axon expressing NPY-GFP and 
BICDN-PDZ. Single coloured arrowheads point to the same vesicle, 
highlighting retrograde (red), anterograde (black) and non-moving (orange) 
vesicles. Scale bar, 5 jim. d, Quantification of the percentage of static, 
anterograde and retrograde moving vesicles from kymographs shown in a and 
b in axons with (n = 10) or without (n = 5) BICDN-PDZ overexpression. 
Graph shows mean + s.e.m., *P > 0.05, one-way ANOVA and Bonferroni’s 
multiple comparison test. e, A fusion of FRB, tagBFP and LOVpep (FRB-LOV) 
and a fusion of GFP and ePDZb1 (PDZ) were expressed in neurons. After 
blue-light illumination, LOVpep undergoes a conformational change, allowing 
binding of PDZ to FRB-LOV. f, Actin dynamics in growth cones coexpressing 
mRFP-actin along with the constructs shown in e, in response to light-induced 
heterodimerization of FRB-LOV and PDZ, representative of n = 5 growth 


cones. The blue box indicates the interval of blue-light illumination. Scale bar, 
5 um. g, Imaging of growing microtubule (MT) plus ends using tdTomato- 
MACFI8 shows the dynamics of microtubule plus-ends in growth cones before 
and during blue-light illumination in neurons co-expressing FKBP-RAB11, 
FRB-LOV and BICDN-PDZ. Red line indicates cell outline, arrowheads 
point at plus-ends. Scale bar, 5 um. h, Kymograph of MACF18 comets of 

the growth cone shown in g and binarized traces used for analysis, 
representative of n = 4 growth cones. Blue box indicates blue-light illumination 
interval. Scale bars, 5 1m and 1 min. i, Area measurement of growth cone 
shown in g before and during blue-light illumination. j, Quantification of 

the number of MACF18 comets per minute in growth cones before and during 
blue-light illumination (n = 4 neurons). Graph shows mean + s.e.m. Paired 
two-tailed t-test, n = 4 cells. k, Quantification of the growth length of MACF18 
comets in growth cones before and during blue-light illumination (n = 4 
neurons). Graph shows mean = s.e.m. Paired two-tailed t-test, n = 4 cells. 

1, Distribution of fraction of MACF18 comets per growth length in bins of 
0.5 jum (n = 214 traces). m, Quantification of the growth speed of MACF18 
comets in growth cones before and during blue-light illumination (n = 4 
neurons). Graph shows mean = s.e.m. Paired two-tailed t-test, n = 4 cells. 

n, Distribution of fraction of MACF18 comets per growth speed in bins of 
O.1pms ! (n = 214 traces). 
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Extended Data Figure 8 | Intensity rescaling and accurate growth cone area 
measurements based on RAB11 fluorescence. a, Mean intensity of growth 
cone FKBP-RAB11 fluorescence from neurons expressing BICDN-PDZ in the 
absence (black, n = 21) or presence of FRB-LOV (red, n = 25) normalized to 
the intensity before (t_2.39 min) (left axis) and rescaled relatively to the intensity 
of -LOV growth cones at tg min (tight axis). Blue box indicates blue-light 
illuminated interval. Graph shows mean + s.e.m. b, Quantification of FKBP- 
RAB11 fluorescence intensity in the same neurons as shown in a after 8 min of 
blue-light illumination, normalized to the average fluorescence at tg min in 
control neurons. Graph shows mean = s.e.m., ***P < 0.0001, Mann-Whitney 
test. c, Area measurements of two representative growth cones from neurons 
expressing FKBP-RAB11, FRB-LOV, BICDN-PDZ and soluble GFP over 
time. Representative of five growth cones (shown in d and e). d, Normalized 
tagRFPt-RAB11 intensity of five growth cones as in c plotted against their 
normalized GFP intensity. Intensity values are averaged over the first five 
frames per growth cone. Pearson correlation coefficient (r) for each growth 
cone is indicated in top left corner. Same colour indicates measurements of the 
same growth cone. e, FRBP-RAB11-based area measurements plotted against 
GFP-based area measurements of the same growth cones as in d. Pearson 
correlation coefficient (r) for each growth cone is indicated in top left corner. 
Same colour indicates measurements of the same growth cone. f, Traces of 
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growth cone area measurements based on FKBP-RAB11 signal in the absence 
(n = 25, red trace) and presence of FRB-LOV (n = 21, black trace) in growth 
cones before and during blue-light illumination (see Methods). Graph 

shows mean + s.e.m. Blue box indicates blue light-exposed interval. 

g, Quantification of the area increase in the absence and presence of FRB-LOV 
in growth cones during blue-light illumination (—4 to 0 min).Values per 
growth cone are averaged over three frames. Graph shows mean ~ s.e.m. 

P= 0.4145 (ns., not significant), Mann-Whitney test. h, Cumulative 
histogram showing the fraction of growth cones with area shrinkage or growth 
(left or right of dashed line, respectively) before blue-light illumination 

(—4 to 0 min). Values per growth cone are averaged over three frames. 

i, Quantification of the area change of -FRB-LOV and +FRB-LOV growth 
cones during blue-light illumination (0 to 8 min). Values per growth cone are 
averaged over three frames. Graph shows mean + s.e.m., *P = 0.0206, 
Mann-Whitney test. j, Cumulative histogram showing the fraction of 
growth cones with area shrinkage or growth (left or right of dashed line, 
respectively) during blue-light illumination (0-8 min). Values per growth cone 
are averaged over three frames. k, Scatter plot showing net growth during 
blue-light illumination and normalized fluorescence intensity after blue-light 
illumination per +FRB-LOV (red) or —FRB-LOV (black) growth cone. 
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Metabolic coupling of two small-molecule thiols 
programs the biosynthesis of lincomycin A 


Qunfei Zhao'*, Min Wang'*, Dongxiao Xu!, Qinglin Zhang” & Wen Liu! 


Low-molecular-mass thiols in organisms are well known for their 
redox-relevant role in protection against various endogenous and 
exogenous stresses’ *. In eukaryotes and Gram-negative bacteria, the 
primary thiol is glutathione (GSH), a cysteinyl-containing tripep- 
tide. In contrast, mycothiol (MSH), a cysteinyl pseudo-disaccharide, 
is dominant in Gram-positive actinobacteria, including antibiotic- 
producing actinomycetes and pathogenic mycobacteria. MSH is equi- 
valent to GSH, either as a cofactor or as a substrate, in numerous 
biochemical processes*, most of which have not been characterized, 
largely due to the dearth of information concerning MSH-dependent 
proteins. Actinomycetes are able to produce another thiol, ergothio- 
neine (EGT), a histidine betaine derivative that is widely assimilated 
by plants and animals for variable physiological activities’. The involve- 
ment of EGT in enzymatic reactions, however, lacks any precedent. 
Here we report that the unprecedented coupling of two bacterial 
thiols, MSH and EGT, has a constructive role in the biosynthesis of 
lincomycin A, a sulfur-containing lincosamide (C8 sugar) antibiotic 
that has been widely used for half a century to treat Gram-positive 
bacterial infections®’. EGT acts as a carrier to template the molecu- 
lar assembly, and MSH is the sulfur donor for lincomycin matura- 
tion after thiol exchange. These thiols function through two unusual 
S-glycosylations that program lincosamide transfer, activation and 
modification, providing the first paradigm for EGT-associated bio- 
chemical processes and for the poorly understood MSH-dependent 
biotransformations, a newly described model that is potentially com- 
mon in the incorporation of sulfur, an element essential for life and 
ubiquitous in living systems. 

Mycothiol (MSH, Fig. 1a) mediated detoxification typically relies on 
the conjugation of MSH to an electrophilic toxin. An amidase, Mca, 
then hydrolyses the resulting MSH S-conjugate to produce a pseudo- 
disaccharide unit, 1-O-glucosamine-D-myo-inositol (GlcN-Ins), anda 
mercapturic acid derivative, which is an excretive N-acetyl-cysteinyl pro- 
duct common in GSH-mediated metabolism’®” (Fig. 1b). In actino- 
mycetes, mca orthologues have been found in several biosynthetic gene 
clusters of antibiotics*"’, including that of lincomycin A, suggesting that 
MSH S-conjugation is associated with the production of these anti- 
biotics. Lincomycin A consists of an N-methylated 4-propyl-L-proline 
(PPL) moiety and lincosamide, an unusual eight-carbon sugar decorated 
with a methylmercapto group at C-1 (Fig. 1d). Cell protection against the 
activity of lincomycin A depends largely on methylation of the bacterial 
ribosome, whereby the molecule mimics the 3’ end of (de)acetyl-tRNA 
and blocks protein synthesis at the initial stage of the elongation cycle’. 
This fact, in combination with the methylmercapto group found in the 
structure, leads to the proposal that MSH S-conjugation has a con- 
structive role in lincomycin biosynthesis by supplying sulfur rather 
than a protective role in antibiotic detoxification. To validate this hypo- 
thesis, we inactivated ImbE, a mca orthologue that is located in the 
lincomycin biosynthetic gene cluster (Imb) in Streptomyces lincolnen- 
sis'>!° (Extended Data Fig. 2a). 


As anticipated, the A/mbE mutant strain accumulated a MSH-associated 
lincomycin analogue, 1 (Fig. 2). In this molecule, the C8-sugar unit is 
appended to MSH via an o-S-linkage (Supplementary Text), identical 
to that of lincomycin A in configuration. The 4/mbE mutant strain still 
produced lincomycin A, albeit in a lower yield, indicating that an addi- 
tional mca orthologue is present outside the mb cluster and partially 
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Figure 1 | Representative low-molecular-mass thiols, relevant metabolic 
pathways, and associated lincosamide natural products. a, Structures of the 
thiols GSH, MSH and EGT. b, Biosynthetic pathway of MSH and its typical 
associated detoxification process (shown in the dashed rectangle). GlcN-Ins is 
recyclable as an intermediate or product. c, Biosynthetic pathway of EGT. 
EgtD is an S-adenosyl methionine-dependent protein that catalyses the first 
reaction. d, Structures of the lincosamide antibiotics lincomycin A and 
celesticetin. 
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Figure 2 | Characterization of LmbE as a pathway-specific Mca protein to 
process 1, the MSH S-conjugated intermediate, in the biosynthesis of 
lincomycin A. For high-performance liquid chromatography-mass 
spectrometry (HPLC-MS) analysis, the electrospray ionization (ESI) m/z 

[M + H]* modes are indicated in the dashed rectangle. a, In vivo product 
profiles of S. lincolnensis strains, including the mutants (i, for AlmbE; ii, for 
AlmbE-E3457; iii, for AmshAj;,; and iv, for AmshA;, supplemented with 
~0.20 mM intermediate 1) and the wild-type control (v). b, Time-dependent 
biotransformation of mercapturic acid 2 (~0.20 mM) into lincomycin A using 
the cell homogenate of the AmshA;;, mutant strain. ¢, In vitro hydrolysis of 
intermediate 1 to generate 2 and GlcN-Ins (left) in the absence (top right) 
and presence (lower right) of LmbE. 


compensates for the loss of lmbE. Sequencing the genome of S. lincolnensis 
revealed three ImbE homologues: ImbE80, ImbE447 and ImbE3457. The 
individual inactivation of these genes was performed in the AlmbE mutant 
strain, but only the AlmbE-E3457 double mutant strain completely lost 
lincomycin producibility and had a concomitant increase in 1 (Fig. 2a 
and Extended Data Fig. 3). Moreover, we expressed and purified LmbE 
from Escherichia coli. This recombinant protein rapidly converted 1 
into two products (Fig. 2c): GlcN-Ins and 2, a mercapturic acid deriv- 
ative (Supplementary Text). Thus, the involvement of LmbE as a specific 
Mca protein in lincomycin biosynthesis was established. 

MSH biosynthesis begins with the glycosyltransferase (GTase) MshA, 
which catalyses the formation of 1-O-(2-N-acetyl)-glucosamine-D-myo- 
inositol-3-phosphate (GlcNAc-Ins-3-P) to afford the essential pseudo- 
disaccharide unit'” (Fig. 1b). To validate the necessity of MSH for 
lincomycin biosynthesis, we identified a mshA orthologue (Extended 
Data Fig. 2b), mshAj,, from the S. lincolnensis genome and inactivated 
it in the wild-type strain. The 4mshAj;,, mutant strain failed to produce 
MSH along with lincomycin A (Fig. 2a and Extended Data Fig. 4). MSH 
S-conjugate 1 was then added to this mutant, leading to the restoration 
of lincomycin production (Fig. 2a). Unambiguously, 1 is a key interme- 
diate rather than a detoxified antibiotic in the biosynthetic pathway. No- 
tably, adding ~0.20 mM 1 to the cells yielded ~ 1.08 mM lincomycin A 
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after a 5 day cultivation period, representing a ~4-fold increase in pro- 
duct concentration compared with the precursor. This increase could 
result from the regeneration of intermediate 1 by restoring the MSH 
cycle, because GlcN-Ins, the product originating from 1 through LmbE- 
catalysed hydrolysis, is the intermediate in MSH biosynthesis*" (Fig. 1b). 
The cell homogenate of the 4AmshAj;, mutant strain was capable of 
transforming ~0.20 mM 2 into a nearly equal amount of lincomycin A 
(Fig. 2b), further confirming the essentiality of the LmbE-catalysed re- 
action in the lincomycin biosynthetic pathway, which hydrolyses 1 to 
generate recyclable GlcN-Ins and intermediate 2. Processing 2, includ- 
ing C-S bond cleavage of the cysteinyl group (mechanistically similar 
to that in the GSH-associated biosynthesis of gliotoxin'’) and subsequent 
S-methylation of the exposed sulfhydryl group, may eventually afford 
the sulfur appendage of lincomycin A. Notably, the generation of ex- 
cretive thiomethyl products has previously been found as an alternative 
to the mercapturate pathway for xenobiotic detoxification in GSH- 
mediated metabolism’. 

We next focused on how and when MSH is incorporated to generate 
S-conjugate 1. Sequence analysis of the /mb cluster revealed two closely 
linked, functionally unassigned genes (Extended Data Fig. 2a), ImbT 
and ImbV. ImbT encodes a protein belonging to the GTase superfamily. 
Recently, lincosamide formation has proven to involve the generation 
of GDP-b-a-D-octose and its associated modifications’? *’. The phos- 
phonucleotidyl group on the resulting product, GDP-D-a-D-lincosamide 
(3) (Fig. 3), facilitates the attack by nucleophiles such as MSH, probably 
requiring a GTase activity; however, direct transfer of lincosamide from 
GDP onto MSH appears unlikely, because GTase-catalysed glycosyla- 
tion is often envisioned proceeding through Sy2 nucleophilic displace- 
ment, which would not explain the same o-linkage that is predicted in 
3 and found in intermediate 1. /mbV encodes a protein classified into 
the DinB-2 superfamily (Extended Data Fig. 5), which now includes 
over ten thousand members with activities presumably related to vari- 
ous low-molecular-mass thiols”. These proteins share a conserved DinB- 
2-like domain combined with variable functional domain(s), indicating 
that thiols potentially act in different biochemical processes. Notably, a 
clade phylogenetically relevant to LmbV contains the MSH-maleylpyr- 
uvate isomerase (Extended Data Fig. 5), one of the few MSH-dependent 
proteins that have been biochemically characterized for the isomeriza- 
tion of maleylpyruvate to fumarylpyruate”**. Accordingly, LmbV was 
proposed to catalyse a MSH-dependent reaction, although the role of 
MSH remained unknown. 

We next established the relevance of LmbT and LmbV to lincomycin 
biosynthesis, as the inactivation of lmbT or ImbV completely abolished 
lincomycin production, which was partially restored by complementing 
each of the genes in the corresponding mutant strain (Fig. 3a). Surpris- 
ingly, a lincomycin analogue, EGT S-conjugate 4, was found in the AlmbV 
mutant (Fig. 3 and Supplementary Text). Re-examination of the AmshAjin 
mutant strain showed a similar product profile in which 4 was the major 
product (Fig. 3a); evidently, EGT S-conjugation is independent of MSH. 
To correlate this thiol with lincomycin production, we surveyed the 
S. lincolnensis genome and identified the EGT biosynthetic genes” (Ex- 
tended Data Fig. 2c), in which egtD;;,, responsible for triple N-methylation 
of histidine in EGT biosynthesis (Fig. 1c), was chosen for inactivation. 
The AegtD};, mutant strain produced lincomycin A, the yield of which, 
however, was significantly reduced at the indicated time points (Fig. 3b). 
Analysis of the mutant cells revealed a trace amount of EGT (with a yield 
~16.8% of that in the wild-type cells, Extended Data Fig. 4), which 
probably resulted from the exogenous assimilation of the culture med- 
ium that included plant-derived, EGT-containing components**. Over 
time, the production of lincomycin A from the mutant became much 
lower than that from the wild-type strain (Fig. 3b), presumably because 
of the limiting amount of EGT. Altogether, EGT is necessary for lin- 
comycin biosynthesis, whereby its S-conjugate 4 is an intermediate 
instead of a shunt product in the pathway. 

Notably, in EGT S-conjugate 4, lincosamide is attached to the thiol 
via a B-S-linkage (Fig. 3 and Supplementary Text). If 1 is the product of 
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LmbV, MSH is probably the agent that performs the nucleophilic attack 
at C-1 of the lincosamide unit, which is activated by EGT rather than 
(d)NDP, and the resulting inversion of configuration is consistent with 
the glycosylation reaction that proceeds via an Sy2 displacement to 
generate the o-S-linkage in MSH S-conjugate 1. Therefore, we chem- 
ically synthesized MSH (Supplementary Methods), making use of the 
scaled-up precursor GlcN-Ins, for an in vitro assay of LmbV activity. 
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Despite our numerous attempts, LmbV was highly refractory to various 
methods of expression. We then selected CcbV, a homologue of LmbV 
(57% identity) in celesticetin biosynthesis”, as the catalyst. Celesticetin 
is a naturally occurring analogue of lincomycin A that possesses the 
same o-S-linkage (Fig. 1d). Heterologous expression of the gene ccbV 
in the AJmbV mutant strain partially restored the production of linco- 
mycin A (Fig. 3a), confirming that CcbV functionally substitutes for 
LmbDV. The in vitro conversion of 4 occurred in the presence of CcbV, 
resulting in the production of 1 along with the release of EGT (Fig. 3c 
and Extended Data Fig. 6). This reaction is irreversible, as MSH S-conjugate 
1 was not converted back to EGT S-conjugated 4 in the presence of 
EGT. Therefore, LmbV and its counterpart CcbV form a new type of 
GTase that catalyses a unique S-glycosylation for thiol exchange between 
EGT and MSH. 

We thus considered that the other GTase, LmbT, could transfer lin- 
cosamine from GDP onto EGT via S22 nucleophilic displacement. LmbT 
was expressed and purified, and to determine its activity, the reverse 
glycosylation reaction was conducted according to a method established 
previously using saturated GDP as the nucleophilic agent’’. Because 
LmbT did not convert EGT S-conjugate 4 to the corresponding GDP- 
mediated sugar product (Extended Data Fig. 7a), we re-examined the 
AlmbT mutant strain and found that the PPL moiety had accumulated 
extensively (Extended Data Fig. 8a). This finding suggested that the 
S-glycosylation of EGT precedes PPL incorporation. Consistently, inac- 
tivation of lmbC, which encodes an adenylation protein characterized 
in PPL activation’, resulted in the production of PPL and an EGT 
S-conjugated lincosamine, 5 (Extended Data Fig. 8a and Supplemen- 
tary Text). Compared with 4, 5 lacks the PPL moiety. A similar result 
was found in the AlmbN and AlmbD mutant S. lincolnensis strains (Ex- 
tended Data Fig. 8a), supporting the corresponding proteins, LmbC, 
LmbN and LmbD, being responsible for incorporating PPL to trans- 
form 5 into 4. LmbN isa bi-functional protein, and its 1,2-isomerization 
activity has recently been shown in lincosamine formation’®. Careful 
analysis of the protein sequence revealed a peptidyl carrier protein (PCP) 
domain present at its amino terminus. LmbC is thus considered. to 
activate PPL with ATP and transfer it onto this PCP domain, followed 
by LmbD-catalysed condensation with EGT S-conjugate 5 to afford 4 
(Extended Data Fig. 8b). Consequently, LmbC, LmbN (or LmbN-PCP, 
the N-terminal PCP domain) and CcbD (the homologue in celesticelin 
biosynthesis” that is functionally identical to LmbD) were expressed and 
purified, and in vitro assays showed that these proteins indeed coord- 
inate PPL attachment to generate 4 (Extended Data Fig. 8). Clearly, 
functionalization of the lincosamine unit by PPL occurs in an EGT 
S-conjugated form, and the PPL-lacking compound 5 probably serves 
as the product of LmbT-catalysed S-glycosylation. 

As anticipated, in the presence of LmbT and the co-substrate GDP, 
5 was efficiently transformed to GDP-D-«-D-lincosamide 3, accompan- 
ied by EGT release (Fig. 3d and Extended Data Fig. 9). We then vali- 
dated the forward activity of LmbT to produce 5 and GDP using the 
substrates 3 and EGT (Extended Data Fig. 7b). The LmbT-catalysed 
reaction exhibited an equilibrium constant K., of 1.94 (Extended Data 
Fig. 9d), indicating that the reverse and forward conversions are com- 
parable; however, the activities of downstream enzymes may drive the 
pathway forward to produce lincomycin A. Consequently, LmbT repre- 
sents a new enzyme that employs EGT asa sugar acceptor and catalyses 
S-glycosylation with the naturally rare C8 sugar to generate EGT 
S-conjugate 5. 

The AlmbC, AlmbN or AlmbD mutant strain produced a minor pro- 
duct, 6 (Extended Data Fig. 8a), which is a MSH S-conjugated linco- 
samide lacking the PPL moiety (Supplementary Text), suggesting that 
LmbV tolerated 5 as a substrate for thiol exchange. CcbV, the homo- 
logue of LmbV, converted 5 into 6 in vitro (Extended Data Fig. 10a); 
however, the rate of this reaction was much lower than that for the 
transformation of 4 into 1. The Mca protein LmbE was not active on 6 
and failed to produce mercapturic acid for further processing (Extended 
Data Fig. 10b). Clearly, lincomycin biosynthesis involves the EGT-mediated 
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assembly of lincosamide with the building blocks, PPL and MSH, and 
the MSH-associated post-modifications in a highly ordered process. 

We uncovered a constructive role of two different bacterial thiols in 
S. lincolnensis and clarified the biosynthetic pathway of lincomycin A, 
in which EGT and MSH-associated metabolism creates an extraord- 
inary biochemical strategy for the formation of active molecule (Fig. 4 
and Extended Data Fig. 1). Lincomycin biosynthesis involves a con- 
vergent pathway to synthesize the L-Tyr-derived PPL moiety and the 
GDP-activated C8 sugar lincosamide. The thiol EGT serves as a carrier, 
via the first S-glycosylation (reversible), to channel the lincosamine unit 
and mediates its condensation with PPL. The thiol MSH goes through 
the second S-glycosylation (irreversible) to associate with the lincomy- 
cin intermediate resulting from EGT and then acts as the sulfur donor 
for affording the methylmercapto group. Both thiols are recyclable or 
reproducible, thus maintaining the biosynthetic route to lincomycin A. 

The characterization of the low-molecular-mass thiol-programmed 
biosynthetic pathway largely expands our knowledge regarding the 
intrinsic, versatile functions of thiols, which are apparently not limited 
to a protective role against oxidative stress and the neutralization of 
electrophilic toxins. The lincosamide antibiotic celesticetin probably 
shares this biosynthetic strategy based on the thiols EGT and MSH, 
despite their differences in processing the MSH appendage”. The involve- 
ment of EGT in C8-sugar transfer and activation represents the first 
biochemical evidence of this thiol in enzymatic reactions and generates 
interest in nucleotide-independent sugar modifications and associated 
glycosylations, which have been less appreciated to date”’. Sulfur is one 
of the most abundant elements in living organisms and contributes toa 
large number of biologically active natural products; however, incorp- 
oration of this atom has not been well established. Complementing the 
recent advance of co-opting the sulfur-delivery system of primary meta- 
bolism for thiosugar formation’’, we demonstrated that MSH serves as 
a different source for sulfur incorporation (Fig. 4). This could be a 
general paradigm because the biosynthetic pathways of several sulfur- 
containing natural products involve homologues of Mca, the protein 
responsible for processing the MSH S-conjugate*"’. The findings reported 
here represent a key step towards elucidating the biochemical mechan- 
isms of numerous MSH and EGT-dependent but poorly understood 
proteins and exploring new features of thiols with regard to their cur- 
rently unknown associated biochemical processes. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

General materials and methods. Biochemicals and media were purchased from 
Sinopharm Chemical Reagent Co., Ltd (China), Oxoid Ltd (UK) or Sigma-Aldrich 
Corporation (USA) unless otherwise stated. Restriction endonucleases were pur- 
chased from Thermo Fisher Scientific Co. Ltd (USA). Chemical reagents were pur- 
chased from standard commercial sources. 

DNA isolation and manipulation in Escherichia coli or Streptomyces strains were 
carried out according to standard methods*"”’. PCR amplifications were carried 
out on an Applied Biosystems Veriti Thermal Cycler using either Taq DNA poly- 
merase (Vazyme Biotech Co. Ltd, China) for routine genotype verification or Phanta 
Max Super-Fidelity DNA Polymerase (Vazyme Biotech Co. Ltd, China) for high 
fidelity amplification. Primer synthesis was performed at Shanghai Sangon Biotech 
Co. Ltd (China), and DNA sequencing was performed at Shanghai Majorbio Biotech 
Co. Ltd or Shenzhen BGI in China. 

High performance liquid chromatography (HPLC) analysis was carried out on 
the Agilent 1200 HPLC system (Agilent Technologies Inc., USA). HPLC Electrospray 
ionization MS (HPLC-ESI-MS) analysis was performed on the Thermo Fisher LTQ 
Fleet ESI-MS spectrometer (Thermo Fisher Scientific Inc., USA), and the data were 
analysed using Thermo Xcalibur software. ESI-high resolution MS (ESI-HR-MS) 
analysis was carried out on the 6230B Accurate-Mass TOF LC/MS System or 6530 
Accurate-Mass Q-TOF LC/MS System (Agilent Technologies Inc., USA) and the 
data were analysed using Agilent MassHunter Qualitative Analysis software. NMR 
data were recorded on the Bruker DRX400 and Bruker AV500 spectrometers (Bruker 
Co. Ltd, Germany), or on the Agilent 500 MHz PremiumCompact+ NMR spectro- 
meter (Agilent Technologies Inc., USA). 

No statistical methods were used to predetermine sample size. 

Gene inactivation and complementation. The inactivation of each gene in S. 
lincolnensis was performed by in-frame deletion to exclude polar effects on down- 
stream gene expression. For complementation in trans, the target gene was under 
the control of PermE*, the constitutive promoter for expressing the erythromycin- 
resistance gene in Saccharopolyspora erythraea. The genomic DNA of the S. lin- 
colnensis wild-type strain served as the template for PCR amplification unless 
otherwise stated. 

For ImbE deletion, the 2.52-kb fragment obtained using primers ImbE-L-for 
and ImbE-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1001. The 2.34-kb fragment obtained using primers 
ImbE-R-for and lmbE-R-rev was digested by Xbal and HindIII and cloned into the 
same site of pLL1001 to yield the recombinant plasmid pLL1002, in which a 549-bp 
in-frame coding region of ImbE was deleted. To transfer pLL1002 into the lincomycin- 
producing strain S. lincolnensis, conjugation between E. coli ET12567-Streptomyces 
was carried out following the standard procedure’. The colonies that were apra- 
mycin resistant at 37 °C were identified as integrating mutants, in which a single- 
crossover homologous recombination event took place. These mutants were cultured 
for several rounds in the absence of apramycin, and the resulting apramycin-sensitive 
isolates were subjected to PCR amplification to examine the genotype, as judged by 
the formation of the desired 1.1-kb product when using primers ImbE-gt-for and 
ImbE-gt-rev. Further sequencing of this PCR product confirmed the genotype of 
LL1001, in which /mbE was in-frame deleted. 

For ImbE-E80 double deletion, the 1.84-kb fragment obtained using primers 
ImbE80-L-for and lmbE80-L-rev was digested by EcoRI and Xbal, and cloned into 
the same sites of pKC1139 to yield plasmid pLL1003. The 2.06-kb fragment obtained 
using primers ImbE80-R-for and ImbE80-R-rev was digested by Xbal and HindIII 
and cloned into the same site of pLL1003, yielding the recombinant plasmid pLL1004. 
Following the procedure described above, pLL1004 was introduced into LL1001 for 
double-crossover recombination, yielding the recombinant strain LL1002, in which 
the 447-bp internal fragment of /mbE80 was deleted in frame. Primers ImbE80-gt- 
for and ImbE80-gt-rev were used for genotype validation by PCR amplification. 
Further sequencing of this PCR product confirmed the genotype of LL1002, in 
which ImbE80 was also in-frame deleted. 

For ImbE-E447 double deletion, the 1.81-kb fragment obtained using primers 
ImbE447-L-for and ImbE447-L-rev was digested by EcoRI and Xbal, and cloned 
into the same sites of pKC1139 to yield plasmid pLL1005. The 1.82-kb fragment 
obtained using primers lmbE447-R-for and ImbE447-R-rev was digested by Xbal 
and HindIII and cloned into the same site of pLL1005, yielding the recombinant 
plasmid pLL1006. Following the procedure described above, pLL1006 was intro- 
duced into LL1001 for double-crossover recombination, yielding the recombinant 
strain LL1003, in which the 459-bp internal fragment of lmbE447 was deleted in 
frame. Primers ImbE447-gt-for and ImbE447-gt-rev were used for genotype valid- 
ation by PCR amplification. Further sequencing of this PCR product confirmed 
the genotype of LL1003, in which ImbE447 was also in-frame deleted. 

For ImbE-E3457 double deletion, the 1.79-kb fragment obtained using primers 
ImbE3457-L-for and lImbE3457-L-rev was digested by EcoRI and Xbal, and cloned 
into the same sites of pKC1139 to yield plasmid pLL1007. The 1.89-kb fragment 


obtained using primers ImbE3457-R-for and ImbE3457-R-rev was digested by 
Xbal and HindIII and cloned into the same site of pLL1007, yielding the recom- 
binant plasmid pLL1008. Following the procedure described above, pLL1008 was 
introduced into LL1001 for double-crossover recombination, yielding the recom- 
binant strain LL1004, in which the 402-bp internal fragment of lmbE3457 was deleted 
in frame. Primers lImbE3457-gt-for and ImbE3457-gt-rev were used for genotype 
validation by PCR amplification. Further sequencing of this PCR product con- 
firmed the genotype of LL1004, in which /mbE3457 was also in-frame deleted. 

For mshA jy, deletion, the 2.25-kb fragment obtained using primers mshA-L-for 
and mshA-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1009. The 2.18-kb fragment obtained using primers 
mshA-R-for and mshA-R-rev was digested by Xbal and HindIII and cloned into 
the same site of pLL1009 to yield the recombinant plasmid pLL1010, in which a 
633-bp in-frame coding region of mshAj;, was deleted. Following the procedure 
described above, pLL1010 was introduced into the S. lincolnensis wild-type strain 
for double-crossover recombination. The resulting strain LL1005 was then sub- 
jected to PCR amplification to examine the genotype, as judged by the formation of 
the desired 1.09-kb product when using primers mshA-gt-for and mshA-gt-rev. 
Further sequencing of this PCR product confirmed the genotype of LL1005, in 
which mshAj;,, was in-frame deleted. 

For ImbT deletion, the 2.56-kb fragment obtained using primers ImbT-L-for 
and ImbT-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1011. The 2.56-kb fragment obtained using primers 
ImbT-R-for and lmbT-R-rev was digested by Xbal and HindIII and cloned into the 
same site of pLL1011 to yield the recombinant plasmid pLL1012, in which a 462-bp 
in-frame coding region of ImbT was deleted. Following the procedure described 
above, pLL1012 was introduced into the S. lincolnensis wild-type strain for double- 
crossover recombination. The resulting strain LL1006 was then subjected to PCR 
amplification to examine the genotype, as judged by the formation of the desired 
1.04-kb product when using primers lmbT-gt-for and ImbT-gt-rev. Further sequen- 
cing of this PCR product confirmed the genotype of LL1006, in which ImbT was in- 
frame deleted. 

For ImbV deletion, the 2.29-kb fragment obtained using primers ImbV-L-for 
and ImbV-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1013. The 2.30-kb fragment obtained using primers 
ImbV-R-for and ImbV-R-rev was digested by Xbal and HindIII and cloned into 
the same site of pLL1013 to yield the recombinant plasmid pLL1014, in which a 
399-bp in-frame coding region of ImbV was deleted. Following the procedure 
described above, pLL1014 was introduced into the S. lincolnensis wild-type strain 
for double-crossover recombination. The resulting strain LL1007 was then sub- 
jected to PCR amplification to examine the genotype, as judged by the formation of 
the desired 1.23-kb product when using primers ImbV-gt-for and ImbV-gt-rev. 
Further sequencing of this PCR product confirmed the genotype of LL1007, in 
which ImbV was in-frame deleted. 

For egtD);, deletion, the 1.98-kb fragment obtained using primers egtD-L-for 
and edtD-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1019. The 1.75-kb fragment obtained using primers 
egtD-R-for and egtD-R-rev was digested by Xbal and HindIII and cloned into the 
same site of pLL1019 to yield the recombinant plasmid pLL1020, in which a 465-bp 
in-frame coding region of egtDj;,, was deleted. Following the procedure described 
above, pLL1020 was introduced into the S. lincolnensis wild-type strain for double- 
crossover recombination. The resulting strain LL1010 was then subjected to PCR 
amplification to examine the genotype, as judged by the formation of the desired 
2.05-kb product when using primers egtD-gt-for and egtD-gt-rev. Further sequen- 
cing of this PCR product confirmed the genotype of LL1010, in which egtD);,, was 
in-frame deleted. 

For ImbC deletion, the 1.69-kb fragment obtained by using primers ImbC-L-for 
and lmbC-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1023. The 1.82-kb fragment obtained by using pri- 
mers ImbC-R-for and ImbC-R-rev was digested by XbaI and HindIII and cloned 
into the same site of pLL1023 to yield the recombinant plasmid pLL1024, in which 
a 708-bp in-frame coding region of ImbC was deleted. Following the procedure 
described above, pLL1024 was introduced into the S. lincolnensis wild-type strain 
for double-crossover recombination. The resulting strain LL1012 was then sub- 
jected to PCR amplification to examine the genotype, as judged by the formation of 
the desired 0.82-kb product when using primers ImbC-gt-for and ImbC-gt-rev. 
Further sequencing of this PCR product confirmed the genotype of LL1012, in 
which ImbC was in-frame deleted. 

For ImbD deletion, the 2.54-kb fragment obtained using primers ImbD-L-for 
and ImbD-L-rev was digested by EcoRI and Xbal, and cloned into the same sites of 
pKC1139 to yield plasmid pLL1031. The 2.51-kb fragment obtained using primers 
ImbD-R-for and ImbD-R-rev was digested by XbaI and HindIII and cloned into 
the same site of pLL1031 to yield the recombinant plasmid pLL1032, in which a 
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459-bp in-frame coding region of ImbD was deleted. Following the procedure 
described above, pLL1032 was introduced into the S. lincolnensis wild-type strain 
for double-crossover recombination. The resulting strain LL1016 was then sub- 
jected to PCR amplification to examine the genotype, as judged by the formation of 
the desired 0.60-kb product when using primers ImbD-gt-for and ImbD-gt-rev. 
Further sequencing of this PCR product confirmed the genotype of LL1016, in 
which ImbD was in-frame deleted. 

For site-specific mutation of ImbN, the 2.01-kb fragment obtained using primers 
ImbN-L-for and ImbN-L-rev was digested by EcoRI and Nehl, and cloned into the 
same sites of pKC1139 to yield plasmid pLL1033. The 1.94-kb fragment obtained 
using primers ImbN-R-for and ImbN-R-rev was digested by Nhel and HindIII and 
cloned into the same site of pLL1033 to yield the recombinant plasmid pLL1034, 
which encoded a $37A mutation that replaced the conserved motif XXXDSL with 
XXXDAL and had a mutation that replaced CTC (encoding L38) with CTA (result- 
ing no coding change) to introduce the Nhel site. Following the procedure described 
above, pLL1034 was introduced into the S. lincolnensis wild-type strain for double- 
crossover recombination. The resulting strain LL1017 was then subjected to PCR 
amplification to give a 0.86-kb product (using primers ImbN-gt-for and ImbN-gt- 
rev). Nhel digestion was then carried out on this PCR product to determine the 
genotype, as judged by the release of the desired 0.11-kb and 0.75-kb fragments. 
Further sequencing of this PCR product confirmed the genotype of LL1017, in 
which ImbN was site-specifically mutated. 

For complementation of ImbT in LL1006, the 1.5-kb JmbT-containing fragment 
was amplified by PCR using primers ImbT-C-for and ImbT-C-rev, and then 
cloned into pMD19-T to yield pLL1015. After digestion with BamHI and Xbal, this 
1.5-kb DNA fragment was recovered from pLL1015 and ligated to a 0.45-kb EcoRI/ 
BamHII fragment from pWHM79, and the resulting product was ligated into the 
EcoRI/Xbal site of pKC1139, yielding the recombinant plasmid pLL1016, in which 
ImbT was under the control of the constitutive promoter PermE*. pLL1016 was 
then introduced into LL1006 (4/mbT mutant) by conjugation, generating the cor- 
responding recombinant strain LL1008 that expressed ImbT in trans. 

For complementation of JmbV in LL1007, a 1.03-kb JmbV-containing fragment 
was amplified by PCR using primers ImbV-C-for and lmbV-C-rev, and then cloned 
into pMD19-T to yield pLL1017. After digestion with BamHI and Xbal, this 1.03-kb 
DNA fragment was recovered from pLL1017 and ligated to a 0.45-kb EcoRI/ 
BamHII fragment from pWHM79, and the resulting product was then ligated into 
the EcoRI/XbalI site of pKC1139, yielding the recombinant plasmid pLL1018, in 
which ImbV was under the control of the constitutive promoter PermE*. pLL1018 
was then introduced into LL1007 (AlmbV mutant) by conjugation, generating the 
corresponding recombinant strain LL1009 that expressed ImbV trans. 

For heterologous complementation of ccbV in LL1007, the 1.15-kb ccbV-containing 

fragment was amplified from the genomic DNA of the celesticetin-producing strain 
S. caelestis NRRL2418 by PCR using primers ccbV-C-for and ccbV-C-rev, and 
then cloned into pMD19-T to yield pLL1021. After digestion with BglII and Xbal, 
this 1.03-kb DNA fragment was recovered from pLL1021 and subsequently ligated 
to a 0.45-kb EcoRI/BamHII fragment from pWHM79; the resulting product was 
ligated into the EcoRI/Xbal site of pXC1139, yielding the recombinant plasmid 
pLL1022, in which ccbV was under the control of the constitutive promoter PermE*. 
pLL1022 was then introduced into LL1007 (A/mbV mutant) by conjugation, gen- 
erating the corresponding recombinant strain LL1011 that heterologously expressed 
ccbV in trans. 
Production and analysis of lincomycin A, intermediate or shunt product. The 
S. lincolnensis wild-type strain or its derivative was spread on agar plates, which 
was composed of 19 g of starch, 5g of soybean meal, 0.5 g of KzHPOx, 0.5 g of 
MgSO,°7H,0, 1.0 g of KNO;, 0.5 g of NaCl, 0.01 g of FeSO4*7H,O and 20.0 g of 
agar per litre (pH 7.0~7.5), and then incubated at 28°C for sporulation and 
growth. Approximately 1 cm” of the sporulated agar of S. lincolnensis was cut, 
chopped, and inoculated into 25 ml of the seed medium, which was composed of 
20 g of starch, 10 g of glucose, 10 g soybean of meal, 30 g of corn steep liquor, 1.5 g 
of (NH4)2SO, and 5 g of CaCOs per litre (pH 7.0). After incubation at 28 °C and 
220 rpm for 36h, 5 ml of the seed culture broth was transferred into 50 ml of the 
fermentation medium, which was composed of 100 g of glucose, 25 g of soybean 
meal, 2 g of corn steep liquor, 8 g of (NH4)2SOx, 0.2 g of KH2PO,, 8 g of NaNO, 5 g 
of NaCl and 8 g of CaCO; per litre (pH 7.0). Further incubation was carried out at 
28 °C and 220 rpm for 7 days. 

For product examination, 500 ull of each fermentation broth was mixed with an 
equal volume of methanol, and after centrifugation to remove the residue, the super- 
natant was subjected to HPLC analysis on an Agilent Zorbax column (SB-C18, 
5 um, 4.6 X 250 mm, Agilent Technologies Inc., USA) by isocratic elution of 40% 
5mM ammonium acetate (NH,Ac) and 60% MeOH for 20 min at a flow rate of 
0.6 ml min” '. HPLC-ESI-MS analysis of the supernatant diluted five times with 
50% MeOH was carried out on the same column by gradient elution of solvent A 
(10 mM NH, Ac) and solvent B (CH3CN) ata flowrate of 1 ml min‘ overa22-min 
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period as follows: t = 0 min, 10% B; t = 9 min, 10% B; t = 17 min, 60% B; t = 18 min, 
60% B; t = 20 min, 10% B; and tf = 22 min, 10% B (mAU at 210 nM). The data were 
analysed using Thermo Xcalibur software. The concentration of lincomycin A was 
estimated by HPLC using the commercially available product as the standard. 

For biotransformation of 2 (the mercapturic acid derivative) into loncomycin A 
by the cell homogenate of the 4mshAjjn, mutant strain, LL1005, the mycelia from 
50 ml of the 6-day fermentation culture broth was collected, washed twice, and 
then re-suspended in 10 ml of 50mM Tris-HCl buffer (pH7.5). Sonication for 
20 min on ice, followed by centrifugation at 4 °C to remove the cellular debris, resulted 
in the supernatant that was used for the conversion of 2. Each biotransformation 
was conducted in 100 pil of the mixture, which contained 1 il of 2 (giving a final 
concentration of 0.20 mM) and 99 1l of the supernatant of LL1005, and incubated 
at 30°C for 0, 1 or 4h. After quenching the reaction by addition of an equal volume 
of methanol, 20 1 of each biotransformation mixture was centrifuged and subjected 
to HPLC-ESI-MS analysis of 2 consumption and of lincomycin A production. 

For feeding of the MSH S-conjugate 1 to LL1005 (AmshAy;,,), 1 was added into 

the culture broth of LL1005 on the 4th day of fermentation (giving a final con- 
centration of approximately 0.20 mM). In a further 5-day incubation period, the 
production of lincomycin A was monitored daily by HPLC and HPLC-ESI-MS as 
described above. 
Production and analysis of MSH and EGT in S. lincolnensis. Approximately 
1 cm’ of the sporulated agar of S. lincolnensis or its derivative was cut, chopped, 
and inoculated into 25ml of YEME medium and then incubated at 28 °C and 
220 rpm for 36h. 5 ml of the resulting culture broth was added into 100 ml of the 
same medium for scale-up and then further incubated at 28 °C and 220 rpm for 
3 days. The mycelia were then harvested by centrifugation at 4°C and 5,000 rpm 
for 15 min. 

For thiol extraction, derivatization and detection, the procedure described by 
Fahey and Newton” was used with slight modifications. Approximately 200 mg 
(wet weight) of the freshly harvested mycelia was weighed in a 5 ml microcentri- 
fuge tube and then re-suspended in 2 ml of a mixture of 50% CH3CN and 50% 
2mM monobromobimane (mBBr) dissolved in 20 mM Tris-HCl (pH 8.0) buffer 
for sonication. The mixture was incubated in the dark at 60 °C for 15 min, acidified 
with 5 pl of methanesulfonic acid (5N) and then centrifuged at 12,000 rpm for 
10 min to remove the debris before storing at — 80 °C. HPLC-ESI-HR-MS analysis 
of the resulting thiol-mBBr derivatives was carried out ona Agilent Zorbax column 
(Extend-C18, 1.8 jum, 2.1 X 50mm, Agilent Technologies Inc., USA) by gradient 
elution of solvent A (H;0) and solvent B (CH3CN) at a flow rate of 0.2 ml min! 
over a 15 min period as follows: t = 0 min, 5% B; t = 5 min, 5% B; t = 10 min, 50% B; 
t = 12 min, 50% B; t= 13 min, 5% B; and tf = 15 min, 5% B (mAU at 370 nm). 
Protein expression and purification. The recombinant proteins LmbE, CcbV, 
LmbT, LmbC, LmbN, LmbN-PCP and CcbD, all of which were in an N-terminal 
6X His-tagged form, were overproduced in E. coli BL21(DE3) and purified by Ni- 
affinity followed by desalting. 

The genes of recombinant proteins LmbE, CcbV, LmbT, LmbC, LmbN, LmbN- 
PCP and CcbD were PCR amplified from S. lincolnensis or S. caelestis genomic 
DNA using primers with engineered NedI and HindIII restriction sites. The PCR- 
amplified gene fragments were purified, digested with Ndel and HindIII and ligated 
into a pET28a(+) vector (Novagen) that had been digested with the same enzymes. 
The resultant plasmids were used to transform E. coli BL21(DE3) for protein over- 
expression. The E. coli transformant cultures were incubated in Luria-Bertani (LB) 
medium containing 50 pg ml’ kanamycin at 37 °C and 250 rpm until the cell density 
reached 0.6-0.8 at OD¢o0nm. To induce protein expression, IPTG (0.1 mM) was 
added to the cultures, which were further incubated at 16 °C for 40-48 h. The cells 
were harvested by centrifugation and stored at —80 °C before lysis. The thawed cells 
were re-suspended in lysis buffer containing 50 mM Tris-HCl (pH 7.6), 300 mM 
NaCl, 10 mM imidazole and 10% (v/v) glycerol. After disruption by a low-temperature 
ultra-high-pressure cell disrupter (FB-110X, Shanghai Litu Mechanical Equipment 
Engineering Co., Ltd, China or JN-02HC, JNBIO, China), the soluble fraction was 
collected, subjected to purification of each target protein by using a HisTrap FF 
column (GE Healthcare, USA) and then desalted using a PD-10 Desalting Column 
(GE Healthcare, USA) according to the manufacturers’ protocols. The resulting 
proteins were concentrated and stored at — 80 °C for in vitro assays. The purity of 
the proteins was examined by 12% SDS-PAGE analysis, and the concentration was 
determined by Bradford assay using bovine serum albumin (BSA) as the standard. 
For LmbN-PCP, HPLC-HR-ESI-Q-TOF MS (Agilent Technologies Inc., USA) 
analysis indicated that the recombinant protein purified from E. coli BL21(DE3) 
was fully phosphopantetheinylated into a holo form (m/z [M + H]* calculated 
11,842.94, found 11,843.42). 

Determination of the amidase activity of LmbE in vitro. The assays were carried 
out at 30 °C for 5h in a 50 jl reaction mixture containing 50 mM Tris-HCl (pH 7.5) 
and 2 mM substrate (MSH S-conjugate) in the presence of 2 .M (for 1) or 20 1M 
(for 6) LmbE. Reactions in the absence of the enzyme were used as negative controls. 
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An equal volume of methanol was added into each mixture to terminate the 
reaction. After removal of the denatured protein by centrifugation, the reaction 
mixtures were subjected to HPLC-ESI-MS analysis on a Phenomenex Luna C18(2) 
column (5 um, 4.6 X 250mm, USA) by isocratic elution of 90% 10 mM NH,Ac 
and 10% CH3CN for 15 min period at a flow rate of 1 ml min“. 
Characterization of the CcbV-catalysed reaction in vitro. The assays were 
carried out at 30 °C for 2 h in a 50 pl reaction mixture. For substrate 4, the mixture 
contained 50 mM Tris-HCl (pH 8.0), 1 mM MSH, 2 mM TCEP, 1 mM EGT S-conjugate 
(4) and 40 uM CcbV. For substrate 5, the mixture contained 50mM Tris-HCl 
(pH 8.0), 2mM MSH, 2 mM TCEP, 2 mM EGT S-conjugate (5) and 100 uM CcbV. 
Reactions in the absence of the enzyme were used as negative controls. The ter- 
mination of each reaction and analysis of the resulting MSH S-conjugate and EGT 
were carried out according to the methods described above for LmbE-catalysed 
conversion. HPLC analysis of EGT production was performed on a COSMOSIL 
HILIC Packed Column (5 um, 4.6 X 250mm, Nacalai Tesque Inc., Japan) by iso- 
cratic elution of 30% 10 mM NH,Ac and 70% CH3CN for 20 min at a flow rate of 
1ml min! (mAU at 260 nM). The commercially available EGT (Enzo Life Sciences 
Inc., USA) was used as the standard. 

To evaluate the pH dependence, each reaction was performed in triplicate at 
30°C for 1h in a 25 pl reaction mixture containing 1 mM MSH, 2mM TCEP, 
lmM 4 and 204M CcbV in 50mM PIPES (pH6.0-7.0) or 50mM Tris-HCl 
(pH 7.5-9.0) buffer. 

For the kinetic analysis, a time course was carried out to determine the initial 
rate conditions in a 25 jl of reaction mixture containing 50 mM Tris-HCl buffer 
(pH 8.0), 1mM MSH, 1 mM TCEP, 1 mM 4 and 20 uM CcbV. The reactions were 
initiated by the addition of CcbV, incubated at 30 °C, and then terminated by 25 ul 
of methanol at 3, 5, 7, 10, 16, 25, 45 and 60 min. The samples were subjected to the 
same work-ups and HPLC analysis of EGT production as described above. The 
production of EGT, linear with respect to time during 0-20 min, was fitted into a 
linear equation to obtain the initial velocity. To determine the kinetic parameters 
for substrate 4, the reactions were carried out at 30 °C for 20 min, each in 25 ul of 
mixture containing 50 mM Tris-HCl buffer (pH 8.0), 2mM MSH, 2mM TCEP, 
and 20 uM CcbV, and varying the concentration of substrate 4 (0.02, 0.05, 0.10, 
0.20, 0.50, 1.00 and 2.00 mM). To determine the kinetic parameters for the sub- 
strate MSH, the reactions were carried out at 30°C for 20 min, each in 25 pl of 
mixture containing 50 mM Tris-HCl buffer (pH 8.0), 5mM 4, 2mM TCEP, and 
20 uM CcbV, and varying the concentration of substrate MSH (0.02, 0.05, 0.10, 
0.20, 0.50 and 1.00 mM). All assays were performed in triplicate, and each con- 
version was analysed by HPLC for EGT production as described above. The resulting 
initial velocities were then fitted to the Michaelis-Menten equation using GraphPad 
Prism5 software (GraphPad Software, Inc., USA) to extract the Ky, and ka parameters. 
Characterization of the LmbT-catalysed reaction in vitro. The reverse glyco- 
sylation assays of LmbT were carried out at 30 °C for 2 h ina 50 il reaction mixture. 
For substrate 5, the mixture contained 50 mM Tris-HCl (pH 7.5), 2mM MgCl, 
1mM 5,2 mM GDP and 4 uM LmbT. For substrate 4, the mixture contained 50 mM 
Tris-HCl (pH7.5), 2mM MgCl, 2mM 4, 4mM GDP and 20M LmbT. The 
forward glycosylation assays were carried out at 30 °C for 2h in a 50 ul reaction 
mixture containing 50 mM Tris-HCl (pH 7.5), 2 mM MgCl, 1 mM 3, 1 mM EGT 
and 41M LmbT. Reactions in the absence of the enzyme were used as negative 
controls. The termination of each reaction was carried out according to the meth- 
ods described above for LmbE-catalysed conversion. The reaction mixtures con- 
taining 5 were subjected to HPLC-ESI-MS analysis on a Phenomenex Luna C18(2) 
column by isocratic elution of 95% 10 mM NH,Ac and 5% CH3CN for 15 min ata 
flow rate of 1 ml min 1. 

To evaluate the pH dependence, each reaction was performed in triplicate at 
30°C for 1h in 25 pl of reaction mixture containing 2mM MgCl, 4mM GDP, 
2mM 5 and 44M LmbT in 50mM PIPES (pH6.0-7.0) or 50mM Tris-HCl 
(pH 7.5-9.0) buffer. 

For the kinetic analysis, a time course was carried out to determine the initial 
rate conditions in 25 jl of reaction mixture containing 50 mM Tris-HCl (pH 7.5), 
2mM MgCl, 1 mM 5, 2mM GDP and 41M LmbT. The reactions were initiated 
by the addition of LmbT, incubated at 30°C, and then terminated by 25 ul of 
methanol at 3, 5, 7, 10, 16, 25, 45 and 60 min. The samples were subjected to the 
same work-ups and HPLC analysis of EGT production as described above. The pro- 
duction of EGT, linear with respect to time during 0-20 min, was fitted into a linear 
equation to obtain the initial velocity. To determine the kinetic parameters for 
substrate 5, the reactions were carried out at 30 °C for 20 min, each in 25 ll of mixture 
containing 50 mM Tris-HCl buffer (pH 7.5), 2mM MgCl, 2mM GDP, and 4uM 
LmbT, and varying the concentration of substrate 5 (0.02, 0.05, 0.10, 0.20, 0.50, 
1.00, 2.00 and 4.00 mM). To determine the kinetic parameters for substrate GDP, 
the reactions were carried out at 30 °C for 20 min, each in 25 1A of mixture con- 
taining 50 mM Tris-HCl buffer (pH 7.5), 2mM MgCl, 4mM 5, and 4 1M LmbT, 
and varying the concentration of substrate GDP (0.02, 0.05, 0.10, 0.20, 0.50, 1.00 


and 2.00 mM). All assays were performed in triplicate, and each conversion was 
analysed by HPLC for EGT production as described above. The resulting initial 
velocities were then fitted to the Michaelis-Menten equation using GraphPad Prism5 
software (GraphPad Software, Inc., USA) to extract the K,, and k,., parameters. 

To determine the equilibrium constant (K.,) of the LmbT-catalysed reaction, 
the experiment was performed according to the method described previously. Keq 
was measured by performing a series of saturated reactions, in which the concen- 
tration ratio of [GDP]/[3] varied from 1/3 to 5 on the premise that the ratio of [5]/ 
[EGT] was fixed at 1. The total concentrations of [3] + [GDP] and [5] + [EGT] were 
both kept at 4mM. The reaction was performed in a 25 ul mixture containing 
50 mM Tris-HCl (pH 7.5), 2mM MgCl, and 4 uM LmbT at 30°C for 3 h to reach 
equilibrium. The change of [EGT] was monitored by HPLC as described previously 
and plotted against the ratio of [GDP]/[3]. The equilibrium constant was subse- 
quently determined from the equation K., = ([GDP]/[3]) X ({5]/[EGT]). 
Characterization of PPL incorporation in vitro. To convert holo-LmbN-PCP 
into PPL-acylated LmbN-PCP, the reaction was carried out at 30°C for 3h ina 
50 pl reaction mixture containing 50 mM Tris-HCl (pH 7.5), 2 4M LmbC, 2 mM 
PPL, 5mM ATP, 10 mM MgCl, 2mM TCEP, and 1 mM CoA in the presence of 
100 1M LmbN-PCP. Reactions in the absence of LmbC or ATP were used as nega- 
tive controls. For product examination, the reaction mixtures were subjected to 
HPLC-HR-ESI-Q-TOF MS (Agilent Technologies Inc., USA) analysis on an Agilent 
Zorbax column by gradient elution of solvent A (HO containing 0.1% formic acid) 
and solvent B (CH3CN containing 0.1% formic acid) at a flow rate of 0.2 ml min | 
over a 45-min period as follows: t = 0 min, 10% B; and t = 45 min, 95% B (mAU at 
210nm). 

For PPL incorporation, the assays were carried out at 30°C for 3h in a 50 ul 

reaction mixture containing 50 mM Tris-HCl (pH 7.5), 2 4M LmbC, 2 mM PPL, 
5mM ATP, 10mM MgCl, 2 mM TCEP, 1 mM CoA, and 2 mM substrate 5 (MSH 
S-conjugate) in the presence of 50 14M LmbN or LmbN-PCP and 10 hM CcbD. 
Reactions in the absence of each enzyme were used as negative controls. The ter- 
mination of each reaction and HPLC-ESI-MS analysis was carried out according to 
the methods described above for LmbE-catalysed conversion. 
Compound isolation and purification. For compounds 1 and 4 (from the AlmbE 
and AlmbV mutant strain, respectively), 100 g of Amberlite XAD-2 resin (Rohm 
and Haas Co., USA) was incubated with 11 of each fermentation culture broth over- 
night to absorb the target compound. After filtration, the resin was backwashed 
with 2 | of water, and then eluted with 3 1 of MeOH. The eluent was evaporated in vacuo 
to acrude extract. The resultant residue was re-dissolved in 5 ml of water, and then 
loaded onto a Sephadex LH20 column (3.5 X 200 cm, GE Healthcare, USA) by 
eluting 500 ml of MeOH at a flow rate of 0.5 ml min *. According to ESI-MS analysis, 
the fractions containing the target compound were combined, evaporated in vacuo 
and then loaded onto an Agilent Zorbax column (SB-C18, 5 jim, 9.4 X 250mm, 
Agilent Technologies Inc., USA) by isocratic elution of 40% 5 mM NH,Acand 60% 
MeOH for 12 minata flowrate of2 ml min !(mAU 210 nm). Aftera similar work- 
up for fractionation and concentration, further purification was carried out on a 
COSMOSIL HILIC Packed Column (5 tm, 10 X 250 mm, Nacalai Tesque Inc., 
Japan) by isocratic elution of 21% 10 mM NH,Ac and 79% CH;CN for 70 min at a 
flow rate of 2mlmin~! (mAU 210 nm). 

For PPL (from the AlmbT mutant strain), after a similar procedure with Amberlite 
XAD-2 resin, further HPLC semi-preparation purification was carried out twice on 
an Agilent Zorbax column by isocratic elution of 20% 5 mM NH,Ac and 80% MeOH 
for 30 min at a flow rate of 2ml min | (mAU at 210nm). 

For compounds 5 and 6 (from the AlmbC mutant strain), 100 g of Amberlite 
XAD-16 resin (Rohm and Haas Co., USA) was incubated with 11 of fermentation 
culture broth overnight to remove most of the impurity. After filtration and con- 
centration, the resultant residue was then loaded onto a Sephadex LH20 column by 
eluting 500 ml of H,O:MeOH (1:1) at a flow rate of 0.5 ml min |. According to 
ESI-MS analysis, the fractions containing compound 5 or 6 were combined, eva- 
porated in vacuo and then loaded onto a COSMOSIL HILIC Packed Column by 
isocratic elution of 30% 10 mM NH,Ac and 70% CH3CN for 30 min at a flow rate 
of2 ml min~ | (mAU at 210 nm). For compound 6, further purification was carried 
out on a Sephadex G10 column (1.5 X 120. cm, GE Healthcare, USA) by eluting 
H,0 at a flow rate of 0.2 ml min '. 

For compounds 2 and GlcN-Ins (from the LmbE-catalysed reaction), the assay 
was scaled up and carried out at 30 °C for 2 h in 20 ml of mixture containing 200 1M 
LmbE, 1 mM pure compound 1 and 50 mM Tris-HCl buffer (pH 7.5). After filtration 
with an ultra-filtration membrane (Amicon YM-30, Millipore) to remove protein, 
the solution was loaded onto an Agilent Zorbax column by gradient elution of 
solvent A (HO) and solvent B (CH;CN) ata flow rate of2 ml min! overa 18-min 
period as follows: t = 0 min, 5% B; t = 5 min, 5% B; t = 10 min, 55% B; t = 16 min, 
55% B; and t = 18 min, 5% B (mAU at 210 nm). The fraction containing GlcN-Ins 
was concentrated and then loaded onto a COSMOSIL HILIC Packed Column by 
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isocratic elution of 25% 10 mM NH,Ac and 75% CH3CN for 40 min at a flow rate 
of 2ml min ‘. GlcN-Ins was examined using a refractive index (RI) detector. For 
compound 2, further purification was carried out on the same column by the same 
isocratic elution for 60 min (mAU at 210 nm). 

For compound 3 (from the LmbT-catalysed reverse glycosylation reaction), the 
assay was scaled up and carried out at 30 °C for 5h in 10 ml of mixture containing 
4 uM LmbT, 2 mM pure compound 5, 2mM GDP, 1 mM MgCl, and 50 mM Tris- 
HCl buffer (pH 7.5). After filtration with an ultra-filtration membrane to remove 
protein, the solution was concentrated and then subjected to a Sephadex G10 col- 
umn by eluting H,O at a flow rate of 0.1 ml min‘. 

Chemical synthesis of MSH. The synthesis was carried out according to the 
procedures described previously**”* with slight modifications. For details, see Sup- 


plementary Methods. 
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A Tale of Two Low-Molecular-Weight Thiols 


EGT-Templated Molecular Assembly 


Recycle 


Lincomycin A 


Regeneration 


GDP-Octose, 3 MSH-Based Sulfur Incorporation 


Extended Data Figure 1 | Constructive role of MSH (orange) and EGT (green) in lincomycin biosynthesis (shown as cartoon models). 
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Extended Data Figure 2 | Genes and/or clusters relevant to lincomycin 
biosynthesis in S. lincolnensis. a, Biosynthetic gene cluster of lincomycin A. 
The mca homologue ImbE and two GTase-encoding genes, ImbV and ImbT, are 
shown in yellow, blue and green, respectively. The genes responsible for PPL 
incorporation (ImbC, ImbD and ImbN) are shown in grey. b, The location of 
mshAj;n (shown in purple), which is not clustered with the other genes 
responsible for MSH biosynthesis, in the genome of S. lincolnensis. The flanking 
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genes 1in5590, lin5591, lin5593 and lin5594 share sequence homology with 
genes encoding the methylase (WP_026151264.1) from S. prunicolor, the 
DUF899-like protein (WP_004004056.1) from S. viridochromogenes, the 
YbjN-like protein (WP_020130332.1) from Streptomyces sp. 303MFCol5.2 
and the hypothetical protein (WP_004000160.1) from S. viridochromogenes, 
respectively. c, The biosynthetic gene cluster of EGT. The gene egtDj;,, is shown 
in red. 
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Extended Data Figure 3 | The production of MSH S-conjugate 1 and AlmbE-E3457) and the wild-type control (v). For HPLC-MS analysis, the ESI 
lincomycin A in various S. lincolnensis strains. The strains include the m/z [M + H]* modes are indicated in the dashed rectangle. 
mutants (i, for AlmbE; ii, for AlmbE-E80; iii, for AlmbE-E447; and iv, for 
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Extended Data Figure 4 | Analysis of MSH and EGT production in the [M + H]* 677.2340) and EGT-mBBr (calculated for C)gH3,.N;O4S~ 
AmshAj;, and AegtD;, mutant S. lincolnensis strains. a, Derivatization of | [M+ H] * 420.1706) in the wild-type control (i), LL1005 (ii, AmshAj;, mutant) 
MSH and EGT with mBBr to generate the corresponding S-conjugates. and LL1010 (iii, degtD;;, mutant). c, The HR-ESI-MS spectra of EGT-mBBr 
b, HPLC-HR-MS analysis of MSH-mBBr (calculated for Co7H4yN1014S* (top) and MSH-mBBr (below). 
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Extended Data Figure 5 | Phylogenetic analysis in the DinB-2 superfamily. 
a, LmbV (from S. lincolnensis) and CcbV (from S. caelestis), shown in red, with 
selected DinB-2-like proteins (from various actinomycetes, in which the thiol 
MSH is dominant) in the phylogenetic tree. The evolutionary distances were 
computed using the p-distance method. The support for grouping clades i, ii, iii, 
iv and v (shaded in different colours) is indicated by bootstrap values. The 
known MSH-maleylpyruvate isomerase Ncgl2918 is shown in blue. b, Typical 
domain organization of the DinB-2-like proteins. The conserved DinB-2 
domain is shown in green. Clade i features the C-terminal MDMPI-C domain. 
Proteins containing this domain include the MSH-maleylpyruvate isomerases, 
such as Ncgl2918. Clade ii features the C-terminal FGE-sulfatase domain, 
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which is found in eukaryotic proteins required for post-translational 
modification to produce sulfatases, which are essential for the degradation and 
remodelling of sulfate esters. Clade iii features the N-terminal zf-HC2 domain, 
which contains a putative zinc-finger binding motif and is found in some 
anti-sigma factor proteins. Clade iv features the C-terminal SCP2 domain 
involved in binding sterols. Clade v features the C-terminal wyosine_f domain. 
Some proteins containing this domain appear to be important in wyosine base 
formation in a subset of phenylalanine-specific tRNAs. ‘Others’ indicate a 
number of DinB-2-like proteins that possess an unknown domain(s) either at 
the C terminus or at both the C and N termini of the proteins. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Relative activity 
° 
a 


0.0 
9 i) ie) 
Pd rad x? o o oy 
eC & & & LF #& 
g g 
pH 
b c 
4 4- 
_e e 
= 3 m= 6 
= £ 
E - E 
3 2 32 
2 =~ 
8 /® : e 
> 1 > 4 
hd © 
e 
e 
0 - T 0 a 
0.0 0.5 1.0 1.5 2.0 0.0 05 10 
Compound 4 concentration (mM) MSH concentration (mM) 

Kn = 1.31 + 0.18 mM Kym = 0.65 + 0.12 mM 

Kat = 0-29 + 0.02 min” Keat = 0.28 + 0.03 min“ 

Kat! Km = 0.22 mM“min Kat! Km = 0.44 mMtmin" 


Extended Data Figure 6 | Kinetic analysis of CcbV-catalysed thiol exchange. _ steady-state kinetic parameters for substrate 4 and for MSH, respectively. The 
a, pH dependence. The activity of CcbV in 50 mM PIPES (pH 6.0-7.0) or error bars are standard error of mean (n = 3). 
50 mM Tris-HCl (pH 7.5-9.0) buffer was measured. b, c, Determination of the 
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Extended Data Figure 7 | Characterization of LmbT-catalysed reverse and 


forward glycosylation. For HPLC-MS analysis, the ESI m/z [M + H]* modes 


are indicated in dashed rectangle. a, Examination of the acylated C8-sugar 
transfer in the presence of LmbT, which showed that LmbT was unable to 


utilize 4 as a substrate for reverse glycosylation to generate the predicted 
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GDP-D-«-p-sugar 7 (left) in the absence (top right) and in the presence 
(lower right) of LmbT. b, Characterization of LmbT-catalysed forward 
glycosylation. LmbT used 3 as a substrate for glycosylation to generate the 
GDP-p-a-D-sugar 5 (left) in the absence (top right) and in the presence 


(lower right) of LmbT. 
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Extended Data Figure 8 | Characterization of PPL incorporation in examination of LmbC-catalysed conversion of holo-LmbN-PCP 
lincomycin A biosynthesis. For HPLC-MS analysis, the ESI m/z[M+H]* — (m/z[M + H]* calculated 11,842.94, found 11,843.42) into PPL-acylated 
modes are indicated in the dashed rectangle. a, In vivo product profiles of LmbN-PCP (m/z [M + H]* calculated 11,982.01, found 11,983.21) in the 
S. lincolnensis strains, including the mutants (i, for AlmbC; ii, for AlmbN; iii, for absence (left) and in the presence (right) of ATP. d, In vitro analysis of the 
AlmbD; and iv, for AlmbT) and the wild-type control (v). b, Process of the condensation between PPL and 5 to generate 4. The catalyst systems included 
incorporation of PPL (with EGT S-conjugate 5) into intermediate 4. PCP LmbC + CcbD (i), LmbC + LmbN (ii), LmbC + LmbN-PCP (iii), 
(blue), peptidyl carrier protein; and I (grey), isomerase. c, HPLC-MS LmbC + LmbN + CcbD (iv), and LmbC + LmbN-PCP + CcbD (vy). 
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Extended Data Figure 9 | Kinetic analysis of LmbT-catalysed reversible 
glycosylation. a, pH dependence. The activity of LmbT in 50 mM PIPES 
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b, c, Determination of the steady-state kinetic parameters for substrate 5 and for 


25 
20- ss 
£ 
E 15- ry 
2 ; 
— 
= 10- , 
o e 
2 
zs = 
ry 
> 5- / 
© 
of | , 
0.0 1.0 2.0 3.0 4.0 5.0 
Compound 5 concentration (mM) 

K,, = 1.83 + 0.18 mM 

Keat = 6.94 + 0.31 min" 

Kat! Km = 3-8 mM™min™ 

LmbT 
3+EGT 5 + GDP 
200- 
= 100 7 
= 
Ee 
0) 
Ww, 
<a 04 
a} 
-1007 
T T ] 
0) 2 4 6 
[GDP]/[3] 


GDP, respectively, in LmbT-catalysed reverse glycosylation. The error bars 
are standard error of mean (n = 3). d, Determination of the equilibrium 
constant (K.g) of LmbT-catalysed glycosylation. Kg = ([GDP]/[3]) x ([5]/ 
[EGT]) = 1.94 1 = 1.94. 
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Extended Data Figure 10 | Validation of the highly ordered process 
involving EGT-mediated assembly and MSH-associated post-modifications 
in lincomycin biosynthesis. For HPLC-MS analysis, the ESI m/z [M + H]* 
modes are indicated in the dashed rectangle. a, Examination of the thiol 
exchange using 5 as a substrate in the absence (top right) and in the presence 
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(bottom right) of CcbV, showing that CcbV was able to convert 5 to MSH 
S-conjugate 6 (left). b, Determination of the hydrolysis reaction in the absence 
(top right) and in the presence (bottom right) of LmbE, showing that this 
enzyme was unable to utilize 6 as a substrate and convert it to the predicted 
mercapturic acid 8 (left). 
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Structure and function of a single-chain, 
multi-domain long-chain acyl-CoA carboxylase 


Timothy H. Tran’, Yu-Shan Hsiao’+, Jeanyoung Jo’, Chi- Yuan Chou’, Lars E. P. Dietrich', Thomas Walz”? & Liang Tong! 


Biotin-dependent carboxylases are widely distributed in nature and 
have important functions in the metabolism of fatty acids, amino 
acids, carbohydrates, cholesterol and other compounds' ~~. Defective 
mutations in several of these enzymes have been linked to serious 
metabolic diseases in humans, and acetyl-CoA carboxylase is a tar- 
get for drug discovery in the treatment of diabetes, cancer and other 
diseases’. Here we report the identification and biochemical, struc- 
tural and functional characterizations of a novel single-chain (120 kDa), 
multi-domain biotin-dependent carboxylase in bacteria. It has pref- 
erence for long-chain acyl-CoA substrates, although it is also active 
towards short-chain and medium-chain acyl-CoAs, and we have named 
it long-chain acyl-CoA carboxylase. The holoenzyme is a homo-hexamer 
with molecular mass of 720 kDa. The 3.0 A crystal structure of the 
long-chain acyl-CoA carboxylase holoenzyme from Mycobacterium 
avium subspecies paratuberculosis revealed an architecture that is strik- 
ingly different from those of related biotin-dependent carboxylases'°"’. 
In addition, the domains of each monomer have no direct contact 
with each other. They are instead extensively swapped in the holoen- 
zyme, such that one cycle of catalysis involves the participation of four 
monomers. Functional studies in Pseudomonas aeruginosa suggest 
that the enzyme is involved in the utilization of selected carbon and 
nitrogen sources. 

The reactions catalysed by biotin-dependent carboxylases proceed in 
two steps and involve at least three different protein components (Ex- 
tended Data Fig. 1). In the first step, a biotin carboxylase (BC) component 
catalyses the carboxylation of the biotin cofactor, which is covalently 
linked to the biotin carboxyl carrier protein (BCCP) component. In the 
second step, the carboxylated biotin translocates to the carboxyltrans- 
ferase (CT) active site and transfers the carboxyl group to the substrate. 
In bacteria, acetyl-CoA carboxylase (ACC) has been well characterized 
as a multi-subunit enzyme, with a BC subunit, a BCCP subunit and two 
subunits (« and 8) for the CT activity (Extended Data Fig. 1). In con- 
trast, ACC isa large (~250 kDa), single-chain, multi-domain enzyme in 
most eukaryotes (Extended Data Fig. 1), with domains that are homo- 
logous to the bacterial subunits. Other members of this family include 
propionyl-CoA carboxylase (PCC)"®, 3-methylcrotonyl-CoA carboxylase 
(MCC)"’, pyruvate carboxylase (PC)'*"* and urea carboxylase (UC)"* 
(Extended Data Fig. 1). 

By examining the sequence database, we identified a novel single-chain 
(~120 kDa), multi-domain biotin-dependent carboxylase in bacteria’. 
The enzyme contains a BC domain at the amino terminus, a BCCP domain 
near the middle, and a CT domain that is homologous to that of ACC 
and PCC (Fig. 1a and Extended Data Fig. 1). Homologues of this enzyme 
are found ina large number of Gram-negative and Gram-positive bacteria, 
such as Rhodopseudomonas palustris, Mycobacterium avium subspecies 
paratuberculosis’* and the human pathogen Pseudomonas aeruginosa 
(Extended Data Fig. 2), with highly conserved sequences (Extended 
Data Fig. 3). These homologues are in fact mis-annotated as PC’* or 
carbamoyl-phosphate synthase (CPS) in the database, as was noted in 
an earlier report’’, probably because they have roughly the same size as 


PCand CPS. The CT domain of PC has a completely different sequence 
and structure’*'® (Extended Data Fig. 1), whereas CPS is not a biotin- 
dependent enzyme and does not have a BCCP domain. These single- 
chain enzymes are somewhat related to a family of acyl-CoA carboxylases 
that have been characterized in Mycobacterium tuberculosis and other 
actinomycetes’, in which BC and BCCP are present in one subunit and 
CT is in a separate subunit (Extended Data Fig. 1; see below). 

We overexpressed several of these single-chain enzymes in Escherichia 
coliand purified them to homogeneity. The proteins migrated at the same 
position on a gel filtration column as the 750 kDa 068 holoenzymes of 
PCC” and MCC", suggesting that these enzymes are hexamers, with a 
molecular mass of ~720 kDa for the holoenzyme. 

We characterized the catalytic activities of the enzyme from R. palustris, 
which is annotated as both PC and CPS in the database. Consistent with 
the sequence analysis was our finding that this enzyme is an acyl-CoA 
carboxylase, and we did not observe any PC activity for it. The enzyme 
is active towards all the acyl-CoAs that we tested, with chain lengths from 
C, to Cy, but it prefers long-chain substrates (Extended Data Table 1). 
The kat values for all the substrates are comparable, while the K,, for 
palmitoyl-CoA is ~350-fold lower than that for acetyl-CoA. The high 
sequence conservation among these proteins (Extended Data Fig. 3) sug- 
gests that they have similar activity profiles. We have therefore named them 
long-chain acyl-CoA carboxylases (LCCs). Broad-spectrum activity has 
been observed for a few of the acyl-CoA carboxylases in actinomycetes*”™*. 

To define the holoenzyme architecture, we determined the crystal 
structure of M. avium subspecies paratuberculosis LCC (MapLCC) at 
3.0 A resolution (Fig. 1b-e); MapLCC shares 52% amino acid sequence 
identity with R. palustris LCC (Extended Data Fig. 3). The holoenzyme 
hexamer is situated on a crystallographic three-fold axis, and there is a 
dimer in the asymmetric unit. The atomic model has good agreement 
with the crystallographic data and the expected geometric parameters 
(Extended Data Table 2). Several segments of the protein have poor or no 
electron density and are not included in the atomic model. These include 
parts of the linkers from the BCCP to the BC and CT domains (Fig. 1b), 
although there is no ambiguity in assigning the BCCP domain to a spe- 
cific monomer. A different assignment will result in gaps that are too 
large to be bridged by the missing residues. 

The overall structures of the two MapLCC monomers in the asym- 
metric unit are similar. With their CT domains superposed, a difference 
of 3° is seen in the orientation of their BC domains (Fig. 1b). The BCCP 
domains have a larger difference, corresponding to a rotation of 15°, indi- 
cating some asymmetry in the holoenzyme hexamer. The BC, BCCP 
and CT domains of each monomer do not have any direct contact with 
each other. 

The structure of the holoenzyme hexamer of MapLCC has the shape 
of an equilateral triangle, obeying 32 symmetry and with a length of 
~180 A for each side (Fig. 1c) and a thickness of ~65 A (Fig. 1d, e). A 
hexamer of the CT domain forms the central core of the structure, with 
three CT domains in each layer and CT dimers being formed by one 
domain from each layer. Dimeric BC domains are located at the vertices 
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of the triangle, contacting CT domain dimers. The BCCP domains are 
situated between the BC and CT domains but not in the active site of 
either domain. The lysine residue that would be biotinylated is located 
on the surface, more than 15 A away from the nearest BC or CT domain, 
and its side chain is disordered. The BCCP domain is not biotinylated in 
this holoenzyme, even though it was expressed under conditions iden- 
tical to those for R. palustris LCC, which was completely biotinylated. 
The overall architecture of the MapLCC holoenzyme is strikingly dif- 
ferent from those of PCC (Fig. 1f)'° and MCC (Fig. 1g)"*, even though 
all three enzymes are ~750 kDa oligomers made up of homologous BC, 
CT and BCCP domains. The central CT domain core of MapLCC is sim- 
ilar to that of the Bs hexamer core of PCC, with a root mean squared 
distance of 1.7 A for 2,073 equivalent Cx. atoms between them (Extended 
Data Fig. 4). In contrast, whereas the BC domains are located above and 
below the central core in PCC and MCC, they are positioned at the side 
of the central core in MapLCC. Moreover, the BC domain is a mono- 
mer in PCC and a weakly associated trimer in MCC, but it is a dimer in 
MapLCC-. In fact, this dimer is similar to that for the BC subunit of 
E. coli ACC’””® (Extended Data Fig. 4) and the BC domain of PC’””’. 
A BT domain was identified in the structures of PCC (Fig. 1f) and 
MCC (Fig. 1g), located between the BC and BCCP domains in the primary 
sequence (Extended Data Fig. 1) and having an important function in 
mediating interactions between their « (BC) and B (CT) subunits’?”’. 
This domain does not exist in LCC, because there are insufficient residues 
in the linker between BC and BCCP to form such a domain (Extended 
Data Fig. 1). However, the BC-BCCP linker and the BCCP-CT linker do 
participate in mediating interactions in the MapLCC holoenzyme, and 
there are also direct contacts between the BC and CT domains (see below). 


b 15*f aN 16 A(1s residues) 
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Figure 1 | Crystal structure of LCC from 

M. avium subspecies paratuberculosis 
(MapLCC). a, Domain organization of MapLCC. 
The domains are labelled and given different 
colours. b, Overlay of the structures of the two 
MapLCC monomers in the asymmetric unit (one 
in colour, the other in grey). Residues that are 
missing in the linkers from BCCP are indicated 
with dashed lines. c, Overall structure of the 

720 kDa hexameric holoenzyme of MapLCC. The 
six monomers are labelled. The domains in the 
three monomers in the top layer (numbered 1, 2 
and 3) are coloured as ina. The BC, BCCP, N and C 
CT domains in the three monomers in the bottom 
layer (numbered 4, 5 and 6) are coloured pink, 
pale blue, pale cyan and pale yellow, respectively. 
The disordered region of the BCCP-CT linker is 
indicated with a dashed line (black). The BC active 
sites are indicated with asterisks (black). The CT 
active sites are on the side of the CT domain core, 
indicated with black arrows. d, Structure of the 
MapLCC holoenzyme viewed down the BC 
domain dimer (red arrow in c). e, Structure of 
the MapLCC holoenzyme viewed down the blue 
arrow in c. The CT active sites are indicated with 
asterisks (black). f, Structure of the 750 kDa a8, 
PCC holoenzyme"®. The view is equivalent to that 
in d. g, Structure of the 750 kDa o%68 MCC 
holoenzyme". The structure figures were produced 
with PyMOL (http://www.pymol.org). 


The domains of the MapLCC monomers are swapped extensively in 
the holoenzyme hexamer. The CT domains of two monomers related 
by a BC domain dimer are located far from each other (Fig. 2a). Simi- 
larly, the BC domains of two monomers related by a CT domain dimer 
are also located far from each other (Fig. 2b). Interactions between these 
domains therefore occur only in the context of the holoenzyme, and the 
structure of the monomer alone is unlikely to be stable (Fig. 1b). 

A total of ~8,200 A? of the surface area of each MapLCC monomer 
is buried in the holoenzyme. The majority of this surface, 5,700 A’, is 
buried by the interface among the CT domains in the central core, and 
a 1,000 A” surface is buried in the BC dimer interface (Fig. 3a, b). The 
remaining interfaces in the holoenzyme make smaller contributions to 
the surface area burial, 600 A” in the BC-CT interface, 500 A? for the 
linkers from BCCP, and 400 A? from the BCCP domain. Nonetheless, 
all of these smaller interfaces together may be important for stabilizing 
the holoenzyme. 

The primary BC-CT interface involves three monomers of MapLCC. 
For example, a B-hairpin structure in the BC domain of monomer 4 
contacts the CT domains of monomer 2 (C domain) and monomer 6 
(N domain; Fig. 3c). Similarly, the BC-BCCP linker just before the BCCP 
domain in monomer | contacts the CT domain of monomer 2 and the 
BC domain of monomer 4 (Fig. 3d). These interactions also demon- 
strate the extensive connections between the monomers in the holoen- 
zyme. The interfaces involve van der Waals contacts, hydrogen bonding 
and ionic interactions. 

Wealso carried out electron microscopy (EM) studies on the MapLCC 
holoenzyme. Negative-stain EM images showed monodisperse particles 
of similar sizes but variable shapes (Extended Data Fig. 5). Classification 
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in SPARX”' of ~25,000 particles yielded 308 classes that represented 
65% of the data set (Extended Data Fig. 5). Cross-correlation analysis 
identified class averages very similar to the top view (cross-correlation 
coefficient 0.856; Fig. 3e) and side view (cross-correlation coefficient 0.752; 
Fig. 3f) of the crystal structure, confirming the holoenzyme architecture 
seen in the crystal. At the same time, substantial variations in the three 
peripheral densities (corresponding to the BC domains) relative to the 
central core (the CT domains) are also observed (Extended Data Fig. 5). 
Each peripheral density can appear as a single or bilobed feature (Fig. 3g), 
probably representing different views of the BC dimer. The densities can 
also be located in different positions relative to the central core, breaking 
the alignment of the two-fold axes of the BC and CT dimers (Fig. 3h). 
Overall, the EM data indicate that the peripheral BC domains move as 
dimers and are flexibly tethered to the CT core (Supplementary Video 1), 
consistent with the crystal structure that shows that the BC-CT contact 
is relatively weak in MapLCC. 

Each BC active site is located ~40 A from a CT active site in the 
MapLCC holoenzyme (Fig. 4a). The BCCP domain, although not located 
in either active site, can access both of them, through conformational 
changes in its linkers. The crystal structure shows that the BCCP domain 
of monomer 1 visits the BC active site of monomer 4 and the CT active 
site at the interface of monomers 2 and 6 (Fig. 4a), suggesting that each 
cycle of catalysis requires the participation of four monomers. This 
again indicates the extensive communications between the monomers 
in this holoenzyme. The residues in both active sites are generally con- 
served with those in other biotin-dependent carboxylases, suggesting a 
similar catalytic mechanism” for LCC. The CT active site has a pocket 
that can accommodate short-chain and medium-chain acyl groups (Ex- 
tended Data Fig. 6), and a conformational change is needed to bind 
long-chain substrates. This conformational flexibility may be impor- 
tant for the enzyme to adapt to and be active towards the broad col- 
lection of substrates. 

In most of the other biotin-dependent carboxylases, BCCP is located 
at the end of a polypeptide chain, and there is therefore only one linker 
to the rest of the protein. The LCCs, and the eukaryotic ACCs (Extended 
Data Fig. 1), are distinct in that BCCP is located in the middle of those 
proteins. There are therefore two linkers from BCCP to the rest of the 
protein. The amino acid sequences of these two linkers in the LCC enzymes 
are not conserved (Extended Data Fig. 3). They are therefore likely to 
be flexible and allow the movement of BCCP during catalysis, which is 
consistent with the fact that a portion of both linkers is disordered in 
the structure (Fig. 1b). Residues in both linkers that are included in the 
current atomic model also have high B values (Extended Data Fig. 4). 
In addition, the B domain of BC and a loop near the carboxy terminus 
of the protein have high B values and are partly disordered. 

The structure of MapLCC also has implications for the holoenzymes 
of other biotin-dependent carboxylases, especially the family of acyl-CoA 
carboxylases in M. tuberculosis, S. coelicolor and other actinomycetes’. 
These enzymes contain two subunits, with BC-BCCP in the « subunit 
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Figure 2 | Extensive domain swapping in the 
structure of the MapLCC holoenzyme. a, The CT 
domains of two monomers (in colour) related by a 
BC dimer are located far from each other (note 
that they are in opposite layers) and participate in 
dimerization with different monomers (in grey). 
The six monomers are labelled. b, The BC domains 
of two monomers related by a CT dimer are located 
far from each other. 


and CT in the B subunit, and there are not enough residues in one of the 
« subunits for a BT domain (Extended Data Fig. 1). The holoenzyme is 
an a6, dodecamer, and the structure of the B; hexamer”? is similar to 
that of the CT domain hexamer in MapLCC. It is therefore likely that the 
holoenzymes of such two-subunit carboxylases (lacking the BT domain) 
share a similar architecture to that of MapLCC (Fig. 1c) rather than PCC 
(Fig. 1f). M. tuberculosis does not have an LCC homologue, indicating 
some differences between these mycobacterial species. 

Eukaryotic ACCs are also single-chain, multi-domain enzymes (Ex- 
tended Data Fig. 1), although there are substantial differences from LCC. 
The eukaryotic ACCs contain ~ 1,000 additional residues, including a 
unique central region of ~700 residues, and they are likely to carry a 
BT domain as well. The overall architecture of the eukaryotic ACCs is 
therefore likely to be different from that of MapLCC observed here. 

We have begun to characterize the physiological functions of LCC, using 
P. aeruginosa (strain PA14) as the model organism. The P. aeruginosa 
enzyme (locus name PA14_46320, homologue of PA1400 in P. aeruginosa 
PAO1) shares 59% amino acid sequence identity with R. palustris LCC. 
This organism also carries the multi-subunit ACC, MCC and geranyl- 
CoA carboxylase (GCC)**”, although it lacks a homologue for PCC. The 
multi-subunit ACC is probably essential, in a similar manner to the E. coli 
enzyme, because no transposon insertions in its subunits are found ina 
PA14 transposon mutant library”’. In contrast, transposon insertions are 
found in the loci encoding LCC, MCC and GCC, suggesting that they are 
not essential for growth under the conditions used to produce this library”*. 

We produced a markerless deletion of LCC/PA14_46320, confirming 
that it is not essential for cell survival. The activity profiles of the mutant 
under ~2,000 conditions were characterized by using a phenotype micro- 
array, which monitored its ability to reduce a tetrazolium dye”. The 
incubation conditions included different carbon or nitrogen sources, 
nutrient supplements, osmolytes, pH values and antibiotics. Although 
most of the conditions showed comparable profiles between the wild 
type and the LCC deletion mutant, phenotypic differences were observed 
for several conditions (Extended Data Fig. 7), and two of these were con- 
sidered significant on the basis of the phenotype microarray analysis— 
using fumarate as the sole carbon source and the Met-Val dipeptide as 
the sole nitrogen source (Fig. 4b). It is not clear how this LCC enzyme 
is linked to the utilization of selected carbon and nitrogen sources in 
P. aeruginosa. The metabolism of both Met and Valis likely to produce 
propionyl-CoA, and the PCC activity of LCC may be important for its 
further degradation, as is the case in many other organisms. This is also 
supported by the fact that P. aeruginosa lacks a proper PCC enzyme. 

A long-chain acyl-CoA carboxylase activity is needed for the biosyn- 
thesis of mycolic acid in Mycobacterium and other actinomycetes*'*”*. 
However, the LCC enzyme studied here is unlikely to be essential for this 
function, because its homologue is absent in M. tuberculosis. Moreover, 
mycolic acid is not known to be present in P. aeruginosa, R. palustris and 
other bacteria, suggesting that this LCC is likely to have different physio- 
logical functions. 
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Figure 3 | Interactions between the central core and the rest of the 
holoenzyme. a, Two views of the molecular surface of a MapLCC monomer, 
coloured by the domains. b, Residues that mediate interactions in the MapLCC 
holoenzyme are indicated by the colours of the domains that they contact. 
For example, a pale cyan patch on the surface of the C domain of CT indicates 
residues that interact with the N domain of another monomer in the 
holoenzyme. The pink patch on BC indicates residues in the BC dimer 
interface, and the pale cyan patch indicates residues in the interface with CT. 
The major areas of contacts are between the CT domains and in the BC domain 
dimer. The views are identical to those in a. c, Residues in the interface 
between the BC and CT domains. The domains and the monomers they belong 
to are labelled. d, Residues in the interface between the BC-BCCP linker just 
before the BCCP domain and the rest of the holoenzyme. e, Class average 
representing the top view (left) most similar to the crystal structure and the 
corresponding projection from the crystal structure filtered to 30 A resolution 
(right). f, Class average representing the side view (left) most similar to the 
crystal structure and the corresponding projection from the crystal structure 
filtered to 30 A resolution (right). g, Three panels showing that each of the 
peripheral BC domains can appear as a single or bilobal density, probably 
representing different orientations of the BC dimer. h, Two panels illustrating 
that the BC domains can adopt different positions around the central CT core. 
The side length of the individual panels in e-h is 340 A. 


The striking differences between the architectures of the LCC, PCC 
and MCC holoenzymes are also linked to functional differences between 
them. Especially, the N and C domains of the B subunits of PCC and 
MCC are swapped relative to each other, and this is coupled to the dif- 
ferent substrate specificity of the two enzymes. PCC carboxylates the 
a carbon of an acid (as a CoA ester), whereas MCC carboxylates the 
y carbon of an a-B unsaturated acid (Extended Data Fig. 1). The struc- 
tural differences therefore suggest that there are two lineages of the biotin- 
dependent carboxylases"’, one containing PCC, ACC and LCC, and the 
other containing MCC and possibly GCC. At the same time, PCC and 
LCC share propionyl-CoA carboxylase activity, indicating that similar 
biochemical activities can be supported by holoenzymes with markedly 
different architectures as well. 
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Figure 4 | Catalysis and function of LCC. a, The BC and CT active sites (black 
asterisks) of MapLCC are separated by ~40 A (red arrow). Molecular surface of 
MapLCC is shown, coloured as in Fig. 1c. The domains are labelled by the 
monomer they belong to. The Lys side chain of BCCP to which biotin would 
be connected to is shown in black. The binding modes of ADP (green) and 
biotin (black) to the E. coli BC subunit are shown as stick models”’. The binding 
mode of CoA (grey) to the CT domain of yeast ACC” and biotin (black) in the B 
(CT) subunit of PCC” are also shown. b, Phenotypic differences between 
wild-type and LCC knockout (APA14_46320) P. aeruginosa strains, revealed 
by a colorimetric assay that monitors the reduction of a tetrazolium dye. 
Assays were performed twice in each medium for the wild-type (red and 
orange) and mutant (blue and cyan) strains. For each panel, the horizontal axis 
is time (24h) and the vertical axis is OmniLog signal’’. 


Overall, our studies have identified a new member of the biotin-depen- 
dent carboxylase family, and revealed a novel architecture for its holo- 
enzyme. These observations are also relevant for other members of this 
family. Moreover, the differences between the architectures of LCC, PCC 
and MCC holoenzymes, despite their sharing domains with homologous 
structures, indicate that these domains can be arranged in remarkably 
different ways to form the various holoenzymes. This may have substan- 
tial implications for other multi-domain proteins, especially eukaryotic 
ACCs, and for protein structure and sequence conservation in general. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Protein expression and purification. Full-length LCCs from several different 
bacterial organisms, including R. palustris, M. avium subspecies paratuberculosis 
and P. aeruginosa, were amplified from genomic DNA by PCR and cloned into 
pET28a, pET26b and/or pET24d vectors (Novagen). The plasmids were transformed 
into BL21Star (DE3) cells (Invitrogen). Protein expression was induced with the 
addition of 1 mM isopropyl B-p-thiogalactoside, and the cells were grown at 16 °C 
for 16-20h. 

To facilitate biotinylation, the recombinant enzyme was co-expressed with the 
E. coli biotin ligase BirA, and 15 mg] biotin was added to the medium. An avidin 
shift assay of the purified enzymes showed that R. palustris LCC was completely 
biotinylated. However, purified MapLCC did not show any biotinylation, possibly 
indicating some degree of selectivity of the BirA enzyme. Expression of P. aeruginosa 
LCC did not produce any soluble protein and was not pursued further. 

Cells were lysed by sonication in a buffer containing 20 mM Tris-HCl pH 8.0, 
250mM NaCl, 5% (v/v) glycerol, 10 mM 2-mercaptoethanol and 1 mM phenyl- 
methylsulphonyl fluoride. Soluble enzyme was purified by Ni’* -nitrilotriacetate 
(Qiagen), anion-exchange and gel-filtration (Sephacryl S-300; GE Healthcare) chro- 
matography. The S-300 running buffer for MapLCC contained 25 mM HEPES 
pH 7.4, 250 mM NaCl and 2.5 mM dithiothreitol. The purified protein was con- 
centrated to 6 mg ml‘, and the solution was supplemented with 5% (v/v) glycerol 
before being flash-frozen in liquid nitrogen and stored at —80 °C. 

The selenomethionyl MapLCC protein was produced in B834 (DE3) cells (Nova- 

gen) that were grown in defined LeMaster medium supplemented with seleno- 
methionine”. The protein was purified with the same protocol as that for the native 
enzyme. 
Protein crystallization. MapLCC was crystallized at 4°C using the microbatch 
method under paraffin oil. The protein solution was mixed with a precipitant solu- 
tion containing 0.1 M Bis-Tris propane pH 7.5-8.5 and 1.5-2.0 M ammonium sul- 
phate. Crystals took 4-6 weeks to grow to full size, and larger crystals were obtained 
by microseeding. They were cryoprotected with reservoir solution supplemented 
with 12-15% (v/v) glycerol and flash-frozen in liquid nitrogen for data collection at 
100 K. The C-terminal His tag on the protein was not removed before crystallization. 
Data collection and structure determination. X-ray diffraction data for the native 
(wavelength 1.075 A) and selenomethionyl (0.979 A) crystals were collected with 
a Q315 charge-coupled device (Area Detector Systems Corporation) at the X29A 
beamline of the National Synchrotron Light Source. The diffraction images were 
processed with the HKL package*’. The crystals belong to space group P23, with 
cell dimensions of a = b = c = 220.9 A. There are two MapLCC monomers in the 
crystallographic asymmetric unit. 

The structure of MapLCC was solved by a combination of molecular replace- 
ment and selenomethionyl SAD phasing. The orientation and position of the BC, 
CT and BCCP domains were located with the program Phaser”. The Se sites were 
located with the program SHELX”, and SOLVE/RESOLVE was used for phasing 
the reflections and automated model building**. The atomic model was built with 
the program Coot*. The structure refinement was performed with the program 
CNS”. The crystallographic information is summarized in Extended Data Table 2. 

We also obtained a second crystal form of MapLCC, with an entire hexamer in 
the asymmetric unit, and were able to collect an X-ray diffraction data set to 4.3 A 
resolution (space group P2;2)2), a = 102 A, b= 292 A, c= 314A). The structure 
of this crystal form was readily solved by the molecular replacement method, and it 
revealed essentially the same holoenzyme architecture (data not shown). 
Electron microscopy and image processing. Purified MapLCC was prepared by 
conventional negative staining with 0.75% (w/v) uranyl formate’’. Images were 
collected with a Tecnai T12 electron microscope (FEI) equipped with an LaBg fila- 
ment and operated at an acceleration voltage of 120 kV. Images were recorded using 
low-dose procedures on an UltraScan 895 4K x 4K charge-coupled device (CCD) 
camera (Gatan) using a defocus of — 1.5 um anda nominal magnification of 52,000. 
The calibrated magnification was X70,527, yielding a pixel size of 2.13 A on the 
specimen level. 

BOXER, the display program associated with the EMAN software package’’, was 
used to select 24,535 particles interactively from 270 CCD images, and the SPIDER 
software package” was used to window the particles into 160 X 160-pixel images. 
To perform iterative stable alignment and clustering (ISAC)*° in SPARX”’, the size 
of the particle images was reduced to 64 X 64 pixels, and the particles were pre- 
aligned and centred. ISAC was run on the Orchestra High Performance Compute 
Cluster at Harvard Medical School (http://rc.hms.harvard.edu), specifying 200 images 
per group and a pixel error of 0.7. After 19 generations, 308 classes were obtained, 
accounting for 15,932 particles (65% of the entire data set) (Extended Data Fig. 5). 
Averages of these classes were calculated using the original 160 x 160-pixel images. 
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To confirm that the ISAC averages were representative of the entire data set, the 
particles were also subjected to ten cycles of multi-reference alignment in SPIDER. 
Each round of multi-reference alignment was followed by K-means classification, 
specifying 300 output classes (Extended Data Fig. 5). 

To compare the class averages with the crystal structure, the crystal structure was 
Fourier transformed, filtered to 30 A with a Butterworth low-pass filter, and trans- 
formed back. Evenly spaced projections were calculated at 4° intervals and sub- 
jected to ten cycles of alignment with masked EM class averages. The class averages 
most similar to the top and side views of the crystal structure and their cross- 
correlation coefficients are presented in Fig. 3e, f. 

To visualize the structural heterogeneity of MapLCC in solution, 179 averages 
obtained by ISAC that showed three peripheral densities were selected (indicated 
by asterisks in Extended Data Fig. 5), ordered according to their correlation coef- 
ficients, and used to prepare Supplementary Video 1. This structural variability is 
probably the reason why it was not possible to calculate a three-dimensional map 
from cryo-EM images of vitrified MapLCC samples. 

Enzymatic assays. The kinetic assays monitored the hydrolysis of ATP by R. palustris 
LCC in the presence of various acyl-CoA substrates, using coupling enzymes to 
convert the ADP product to NADH oxidation”. The reaction mixture contained 
100 mM HEPES pH 7.5, 40 mM KHCOs, 1.5 mM ATP, 0.4mM NADH, 200 mM 
KCl, 10 mM MgCl, 0.5 mM phosphoenolpyruvate, 3.5/3.7 U of lactate dehydro- 
genase/pyruvate kinase (Sigma), 0.25 UM enzyme (except for MCC, which was at 
1.2 tM) and various concentrations of acyl-CoA. The absorbance at 340 nm was 
monitored for 1.5 min. The initial velocities were fitted to the Michaelis-Menten 
equation using the program Origin (OriginLab). 

Construction of an LCC deletion mutant in P. aeruginosa. A markerless deletion 
was generated for the gene PA14_ 46320 in P. aeruginosa PA14, using previously 
described methods”. In brief, ~ 1-kilobase flanking regions for PA14_46320 were 
amplified with primers listed in Extended Data Table 3 and recombined into the 
allelic-replacement vector pMQ30 through gap repair cloning in the yeast strain 
InvScl (ref. 43). This plasmid was transformed into E. coli BW29427 and moved into 
PA14 using biparental conjugation. Luria-Bertani (LB) agar containing 100 pg ml 
gentamicin was used to select for P. aeruginosa single recombinants. Markerless 
deletions in PA14_46320 (double recombinants) were then selected with the use 
of LB agar plates devoid of NaCl and containing 10% (w/v) sucrose as a counter- 
selection, and their genotypes were confirmed by PCR. 

Phenotype microarrays. Phenotype microarray screening was performed by Biolog, 
Inc., as described?’. 
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Extended Data Figure 1 | Domain organization of biotin-dependent are given the same colours. The CT domain of PC has a completely different 


carboxylases. a, Reactions catalysed by biotin-dependent carboxylases. Biotin sequence and structure from those of ACC and PCC. The proteins are drawn to 
is linked to the side chain of a Lys residue in BCCP, and this flexible arm hasa scale, and a scale bar is shown at the bottom. BT, BC-CT interaction 
maximum length of ~16 A. The BCCP domain must also translocate to domain; PT, PC tetramerization domain, also known as allosteric domain. 
reach both active sites, separated by distances of 40-85 A based on known c, Chemical structures of the substrates of ACC, PCC, LCC, MCC and GCC. 
holoenzyme structures (swinging domain model). b, Domain organizations of | The site of carboxylation is indicated with the red arrow. 

several representative biotin-dependent carboxylases. Homologous domains 
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Extended Data Figure 2 | Phylogenetic trees for selected biotin-dependent 
carboxylases. a, Phylogenetic tree for LCC homologues in a collection of 
organisms. The three homologues studied in this paper are shown in red. 


b, Phylogenetic tree for PCC homologues in a collection of organisms, based on 


a sequence alignment of the B subunit. Modified from an output from the 
Phylogeny.fr server™. 
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Extended Data Figure 3 | Sequence alignment of long-chain acyl-CoA domains in the proteins are labelled. The BCCP domain has two linkers to the 

carboxylases (LCCs) from M. avium subspecies paratuberculosis rest of the protein. Modified from an output from ESPript®. 


(MapLCC), R. palustris (RpLCC), and P. aeruginosa (PaLCC). The various 
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Extended Data Figure 4 | Structural comparisons of domains in LCC with 
related enzymes. a, Stereo drawing of the overlay of the structure of the BC 
domain dimer of MapLCC (in colour) with that of BC subunit dimer of E. coli 
ACC (in grey)”*. The bound positions of biotin (black) and ADP (green) in 
the E. coli BC structure are also shown. The two-fold axis of the dimer is 
indicated by the black oval. With the two monomers at the bottom overlaid, 
a difference of 21° in the orientations of the two monomers at the top is 
observed. Most of the B domain of BC is ordered in one of the two monomers of 
MapLCC. In the other monomer, only weak electron density is observed 

for a few segments, and the B domain is not modelled. b, Overlay of the 
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structures of the CT domain hexamer of MapLCC (in colour) and the B subunit 
of PCC (in grey)’®. Each enzyme is highly conserved across species; the overlay 
should therefore be meaningful. c, Stereo drawing of the overlay of the CT 
domain dimer of MapLCC (in colour) and the 8 subunit of PCC (in grey). 
The view is down the red arrow in b. The bound position of biotin in the 
holoenzyme is shown in black. The position of CoA is modelled on that of CoA 
bound in the active site of the CT domain of yeast ACC”. d, Plot of the 
temperature factor value of each Co atom in the two monomers (in red and 
blue). Several linker regions with high temperature factor values are indicated. 
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Extended Data Figure 5 | Electron microscopy studies of LCC. 

a, Representative raw image of negatively stained MapLCC. ‘S’ marks a side 
view of the holoenzyme, and ‘C’ indicates a contaminant. Scale bar, 500 A. 

b, The 308 class averages of negatively stained MapLCC obtained from 19 
generations of the iterative stable alignment and clustering (ISAC) procedure” 
implemented in SPARX”'. These class averages represent 65% (15,932 particles) 
of the entire data set (24,535 particles). Averages representing side views 

are marked with ‘S’, averages that were used to create Supplementary Video 1 


are marked with an asterisk, and averages that represent a contaminant are 
marked with ‘C’. The side length of the individual panels is 340 A. c, The 
averages obtained by classifying all 24,535 particles of negatively stained 
MapLCC into 300 classes using K-means classification in SPIDER”. Averages 
are shown in rows, with the most populous class at the top left and the least 
populous class at the bottom right. The side length of the individual panels 

is 340 A. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 6 | The CT active site of LCC. a, Stereo drawing of the _ and clashes with the CoA model. There is also a clash with the adenine base 


overlay of the CT active site (cyan and yellow) of MapLCC with that of of CoA. There may be a conformational change in this region of MapLCC for 
PCC (grey)'°. The model of CoA was obtained from the structure of the CoA binding. b, Molecular surface of the CT active-site region of MapLCC. 
complex with yeast ACC CT domain”’. The «6 helix in the N domain of The «6 helix in the N domain of monomer 6 was removed for a clearer view of 


monomer 6 shows a more closed conformation (indicated by the red arrow) the active site. 
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Extended Data Figure 7 | Phenotypic differences between wild-type and (red and orange) and mutant (blue and cyan) strains. Shown are activity 
LCC knockout (APA14_46320) P. aeruginosa strains, revealed by a profiles for strains incubated with Gly-Pro as the sole carbon source (top panel) 


colorimetric assay that monitors the reduction of a tetrazolium dye. The and Asp-Phe, Glu-Val and Met-Asp as the sole nitrogen source (bottom 
conditions were identified from a screen that sampled 1,920 different media panels). For each panel, the horizontal axis is time (24h), and the vertical axis is 
(Biolog Inc.). Assays were performed twice in each medium for the wild-type | OmniLog signal’. 
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Extended Data Table 1 | Kinetic parameters of R. palustris LCC towards various substrates 


Substrate 
Acetyl-CoA (C2) 
Propionyl-CoA (C3) 
Butyryl-CoA (C4) 
Hexanoyl-CoA (C6) 
Octanoyl-CoA (C8) 
Decanoyl-CoA (C10) 
Lauroyl-CoA (C12) 
Myristoyl-CoA (C14) 


Palmitoyl-CoA (C16) 


3-methylcrotonyl-CoA 


Km (mM) 


2.0+0.3 


2205 


0.82+0.16 


0.20+40.04 


0.37+0.06 


0.033+0.005 


0.019+0.003 


0.026+40.002 


0.0058+0.0005 


L041 


The errors are standard deviations from fitting one titration curve to the Michaelis-Menten equation. 
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Keat (s") 
0.77+40.05 
0.54+0.07 
0.35+40.03 
0.30+0.01 
0.3340.01 
0.28+0.01 
0.24+0.005 
0.45+0.01 
0.16+0.003 


0.074+0.004 


LETTER 


Extended Data Table 2 | Data collection and refinement statistics 


MapLCC 

Data collection 
Space group P2,3 
Cell dimensions 

a, b, c (A) 220.9, 220.9, 220.9 

a, B, y (°) 90, 90, 90 
Resolution (A) 50-3.0 (3.1-3.0) * 
Rmerge 9.6 (44.6) 
I/ol 10.3 (1.9) 
Completeness (%) 91 (72) 
Redundancy 3.3 (2.1) 
Refinement 
Resolution (A) 50-3.0 
No. reflections 64,953 
Rwork/ Riree 20.9 / 26.2 
No. atoms 

Protein 14,632 

Ligand/ion 0 

Water 0 
B-factors 

Protein 66.4 

Ligand/ion - 

Water _ 
R.m.s deviations 

Bond lengths (A) 0.007 

Bond angles (°) 1.4 


Two crystals were used for data collection. 
*The highest-resolution shell is shown in parenthesis. 
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Extended Data Table 3 | Primers used for making the LCC deletion mutant 


Primer Sequence (5' to 3') 
APA14_46320 flank 1F CCAGGCAAATTCTGTTTTATCAGACCGCTTCTGCGTTCTGAT 
GCTGCCTGCTCTACATGCT 


APA14_46320 flank 1R CCTTCAACGCCTTGCTGATCCAGCTACCTGGAGATCGAC 
APA14_46320 flank 2F GTCGATCTCCAGGTAGCTGGATCAGCAAGGCGTTGAAGG 


APA14_46320 flank 2R GGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCT 
GGCGCGACCAGTAGAGATT 
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ILLUSTRATION BY THE PROJECT TWINS. 


TOOLBOX 


PICK UP PYTHON 


A powerful programming language with huge community support. 


BY JEFFREY M. PERKEL 


at Iowa State University in Ames. Offi- 

cially, she is an assistant professor of agri- 
cultural and biosystems engineering. But she 
works not in the greenhouse, but in front ofa 
keyboard. Howe is a programmer, and a key 
part of her job is as a ‘data professor’ — devel- 
oping curricula to teach the next generation 
of graduates about the mechanics and impor- 
tance of scientific programming. 

Howe does not have a degree in computer 
science, nor does she have years of formal train- 
ing. She hada PhD in environmental engineer- 
ing and expertise in running enzyme assays 
when she joined the laboratory of Titus Brown 
at Michigan State University in East Lansing. 


L= month, Adina Howe took up a post 


RQVE NSS Qe 


Brown specializes in bioinformatics and uses 
computation to extract meaning from genomic 
data sets, and Howe had to get up to speed on 
the computational side. Brown’s recommenda- 
tion: learn Python. 

Among the host of computer-programming 
languages that scientists might choose to pick 
up, Python, first released in 1991 by Dutch pro- 
grammer Guido van Rossum, is an increasingly 
popular (and free) recommendation. It com- 
bines simple syntax, abundant online resources 
and a rich ecosystem of scientifically focused 
toolkits with a heavy emphasis on community. 


HELLO, WORLD 

With the explosive growth of ‘big data’ in 
disciplines such as bioinformatics, neurosci- 
ence and astronomy, programming know-how 


is becoming ever more crucial. Research- 
ers who can write code in Python can deftly 
manage their data sets, and work much more 
efficiently on a whole host of research-related 
tasks — from crunching numbers to cleaning 
up, analysing and visualizing data. Whereas 
some programming languages, such as MAT- 
LAB and R, focus on mathematical and statis- 
tical operations, Python is a general-purpose 
language, along the lines of C and C++ (the 
languages in which much commercial software 
and operating systems are written). As such, it is 
perhaps more complicated, Brown says, but also 
more capable: it is amenable to everything from 
automating small sets of instructions, to build- 
ing websites, to fully fledged applications. Jes- 
sica Hamrick, a psychology PhD student at the 
University of California, Berkeley, has been > 
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>» programming in Python since 2008 and 
uses it in all phases of her research. In a study 
investigating how people manipulate geomet- 
ric objects in their minds, for instance, she used 
the language (as well as JavaScript) to generate 
different shapes, present those to study partici- 
pants, record their choices and analyse the data. 

Despite its general-purpose power, Python is 
considered less painful for beginners to learn 
than other options. That accessibility is a func- 
tion of both the language itself and the resources 
that have been built up around it (see ‘A Python 
toolkit’). For example, software execution can be 
interactive — type a command, get a response 
— whereas in C, a compilation step is required 
to translate the code into an executable file, 
which complicates the process for neophytes. 
The language is also generally easier to han- 
dle; users do not have to predefine whether a 
variable will hold numbers or text, for instance. 
The classic programming exercise of printing 
‘Hello, world!’ to the screen is as simple as it can 
be in Python — just type print(“Hello, 
world!” ) ata Python prompt and hit Enter. 
“Tt’s easier to teach novice programmers how to 
get things done in Python than in C++ or C,’ 
says Brown, now at the University of Califor- 
nia, Davis. Python is in fact a popular choice for 
introductory programming classes in general. 

The community aspect is particularly impor- 
tant to Python's growing adoption. Program- 
ming languages are popular only if new people 
are learning them and using them in diverse 
contexts, says Jessica McKellar, a software- 
engineering manager at the file-storage service 
Dropbox and a director of the Python Software 
Foundation, the non-profit organization that 
promotes and advances the language. That kind 
of use sets up a “virtuous cycle’, McKellar says: 
new users extend the language into new areas, 
which in turn attracts still more users. 

The community seems especially dedi- 
cated to encouraging women, Brown notes. 
There are numerous women-centric resources 
available, including workshops offered by the 
Hackbright Academy in San Francisco, the 
non-profit organization Ladies Learning Code 
in Toronto, Canada, and the global mentor- 
ship group PyLadies. As a master’s student at 
McGill University in Montreal, Canada, Emily 
Irvine picked up Python to help her make 
sense of neuronal electrophysiology data. She 
was attracted to the language because of its 
“simple syntax” and “massive amount of online 
support”. But just as important was the wider 
Python community, says Irvine, who will start 
a PhD in neuroscience at Dartmouth College 
in Hanover, New Hampshire, this autumn. At 
the PyCon conference last April in Montreal, 
“they just had such a welcoming atmosphere, 
especially towards women and scientists”. 

Educational resources also abound. The 
Software Carpentry Foundation runs a series 
of two-day workshops that focus on scientific 
programming, and many of its educational 
resources are available online. Online classes 


A PYTHON TOOLKIT 
How to get started 


@ Install Python through Anaconda 

or Enthought Canopy and find 
documentation at the Python Software 
Foundation 

@ Lessons for beginners can be found 
at Software Carpentry; Learn Python 
the Hard Way; Codecademy; and Think 
Python 

© Other online resources on Python 
programming include a course from the 
Massachusetts Institute of Technology in 
Cambridge, lecture notes from Thomas 
Robitaille at the Max Planck Institute 

for Astronomy in Heidelberg, Germany, 
and a widely recommended essay from 
Google’s head of research, Peter Norvig 
© Open-source packages are available 
through SciPy.org 

© Guides to programming and 
community support are available 
through Ladies Learning Code and Stack 
Overflow. PyCon.org lists conferences 
around the world. 


Links to these resources can be found at 
go.nature.com/x2pzh1 


are also available through Coursera in Moun- 
tain View, California, and Edx in Cambridge, 
Massachusetts, as are do-it-yourself tutorials, 
such as those hosted by Codecademy in New 
York City. (Because Python is named in honour 
of Monty Python, these tutorials often work ref- 
erences to the British comedy troupe into their 
exercises: one Codecademy exercise, for exam- 
ple, is to capitalize and calculate the length of the 
phrase ‘the ministry of silly walks.) 

Irvine taught herself to code using online 
courses and a healthy dose of the program- 
ming Q&A site stackoverflow.com. Today, she 
says, she considers herself somewhere between 
a beginner and an intermediate Python pro- 
grammer, or ‘pythonista, as they are sometimes 
called. 


THE FULL MONTY 
Of course, user-friendliness is meaningless 
if researchers cannot write the software they 
need. That is where Python's packages, which 
extend the language with new functionality, 
come into play. “Python was developed as a 
language with a philosophy that it was ‘batteries 
included,” McKellar says — it has built-in capa- 
bilities that make it easy to get started right out 
of the box. But, “it also 
has a very mature pack- 
age ecosystem around it. 
Anything that you could 
possibly write code to 
solve, people have 


> NATURE.COM 

For more on scientific 
software, apps and 
online tools, visit: 
nature.com/toolbox 
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written libraries to make that easier for you.” 

Scientific programmers, irrespective of 
their discipline, routinely use a small set of 
core packages: NumPy (mathematical arrays), 
SciPy (linear algebra, differential equations, 
signal processing and more), SymPy (symbolic 
mathematics), matplotlib (graph plotting) and 
Pandas (data analysis). Another popular tool, 
Cython, addresses Python’ relatively slow exe- 
cution speed. Cython optimizes certain aspects 
of Python code, such as ‘for loops (used to 
instruct a program to repeatedly run a specific 
block of code) that are notoriously slow, essen- 
tially by converting them into C. “You can get 
speed-ups that are up to 1,000 times faster than 
standard Python,” says Paul Nation, a theoretical 
physicist at Korea University in Seoul. 

The IPython Notebook is another popular 
package — Howe terms it “a coder’s lab note- 
book’ — that allows users to interleave data, 
code and explanatory text in a single browser- 
based page, rather than in separate files (see 
Nature 515, 151-152; 2014). 

Beyond the core packages, software packages 
exist for just about every scientific discipline, 
including scikit-Learn for machine learning, 
Biopython for bioinformatics, PsychoPy for 
psychology and neuroscience and Astropy for 
astronomers. Thomas Robitaille, a coordinator 
of the Astropy project and a researcher at the 
Max Planck Institute for Astronomy in Heidel- 
berg, Germany, says that Astropy was created 
to reduce duplicated effort between research 
groups. It gives users a core set of abilities, 
such as ways to convert coordinates from one 
astronomical mapping system to another, anda 
unified interface for reading and writing differ- 
ent data file formats, manipulating images and 
carrying out cosmological calculations. QuTip, 
another Python package, enables researchers 
working on quantum mechanics to define a sys- 
tem and then simulate how it behaves. The pro- 
ject was launched in 2010 by Nation and Robert 
Johansson, a postdoctoral fellow in RIKEN’s 
Interdisciplinary Theoretical Science Research 
Group in Wako, Japan, to adapt into Python a 
MATLAB package that Nation was using. 

Such packages are key enablers of McKellar’s 
‘virtuous cycle’ But researchers could probably 
do their work using any language, provided they 
put in the time to learn it. (Indeed, in many lan- 
guages, including Python, it is possible to run 
algorithms written in a different language, 
thereby allowing researchers to reuse their old 
code.) The difficult part of learning to program 
lies with the fundamentals, says Brown — once 
a researcher has those nailed down, adapting 
to a new language is just a matter of syntax. 
What matters most in the early stages is having 
a good support network. “Pick the program- 
ming language based on what people around 
you are using,’ Brown advises. Increasingly, that 
language is Python. = 


Jeffrey M. Perkel is a writer based in 
Pocatello, Idaho. 


WAVEBREAKMEDIA/SHUTTERSTOCK 


CAREERS 


TURNING POINT Publishing when your 
research won’t reproduce p.129 


GENDER BIAS Attitudes will be difficult to 
change on evidence alone p.129 


DATA NEEDED Where PhDs go after 
graduation p.129 


Pi 


Fresh 


perspective 


Undergraduate researchers can boost alab’s energy and 


work, but need help to flourish. 


BY PAUL SMAGLIK 


lan Berkowitz was taken aback when a 
Acie told him that undergraduate 
research is an oxymoron. Berkowitz, 

head of education at the Cary Institute of 
Ecosystem Studies in Millbrook, New York, 
had reason to be surprised. Since 1988, he has 
been running an undergraduate research pro- 
gramme at Cary, and he recruits up to a dozen 
students a year for the institute’s 12-week sum- 
mer session. He says that undergraduates are 
much more than cheap labour and can contrib- 
ute to further insight. In training them, he says, 
he has seen his own scientific thinking sharpen. 
Had Tracy Johnson not gained research 
experience as an undergraduate, she says, she 
probably would not have done a science PhD. 
“As with many students who love science, I'd 
only considered medical school,” says Johnson, 
now a molecular biologist at the University 


of California, Los Angeles. The experience 
changed her career trajectory. “When I was 
in the lab, it was like the world opened up. I 
understood what the process was. I learned 
you could create new knowledge. If you had 
the right intellectual tools, you could ask and 
answer questions.” Her adviser helped her to 
realize that she could contribute to science, 
even at this early point in her career. “It was a 
wonderful experience because he was a terrific 
mentor, and he created a research environment 
that was rigorous but fun,’ she says. “The post- 
docs and graduate students in that lab seemed 
to have fun working together and doing great 
science. That experience set the bar for what I 
wanted my own research lab to be like” 
Whereas the benefits of undergraduate 
research for the student might seem obvious, 
some — like Berkowitz’s colleague — wonder 
why a principal investigator (PI) would ever 
want to staff their lab with inexperienced people 


who have busy coursework schedules that make 
it hard to attend regular meetings or get into a 
work rhythm. Rotating undergraduates into and 
out ofa lab every summer or semester means 
that PIs must find projects that do not require a 
long-term commitment. And, of course, under- 
graduates will have a steep learning curve just to 
master the basic language of the lab, let alone its 
protocols and techniques. 

A Pl who decides to bring in undergraduates 
will need to plan for their inexperience and 
time constraints. Experiments cannot be left 
unattended with the assumption that an under- 
graduate will know what to do. It may be nec- 
essary, say veteran lab heads who have hosted 
many undergraduates, to be more patient than 
with older trainees, and to spend more time 
in the lab rather than analysing data or writ- 
ing grant applications. But the pay-offs can be 
significant. Some of those benefits include pro- 
viding mentoring opportunities and soft-skill 
building for graduate students and postdocs. 
And sometimes an undergraduate’s unbiased 
opinion can benefit a research project. At other 
times, simply having more hands ina lab can 
speed up the work. Furthermore, a PI and other 
lab members can help to drive a young student's 
career choice, just as they did with Johnsons. 


PROJECT MANAGEMENT 
When undergraduates form part ofa lab team, it 
is important to come up with experiments that 
have a definitive end point and can be divided 
into small, manageable blocks, says David Asai, 
who runs the undergraduate research pro- 
gramme at the Howard Hughes Medical Insti- 
tute in Chevy Chase, Maryland, which places 
students with investigators around the nation. 
“Undergrads are very busy. They have classes 
they have to take. They are involved in lots of 
other things that are important,” he says. 
Susan Singer, director of the US National 
Science Foundation’s Research Experience 
for Undergraduates programme, says that she 
worked with undergraduates for some 30 years 
in her developmental-biology lab. She agrees 
that it is crucial to structure research projects to 
align with students’ availability and experience. 
During one gene-expression experiment, for 
example, she parcelled her students into groups, 
each of which looked at a different gene. Every 
four hours, different students in each group 
would collect plant tissue samples and RNA, 
ensuring that no person was solely responsible 
for any one data point, and no important infor- 
mation ever went missing. “You have to build in 
redundancies, checks and balances,’ she says. > 
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> Jackie Tanaka, who leads an undergrad- 
uate research programme at Temple Univer- 
sity in Philadelphia, Pennsylvania, finds that 
investing more time up front pays off. Research 
projects typically require a lot of repetitive 
tasks that could elicit complaints of boredom 
if the students do not understand why the jobs 
need to be done. “Students have to be trained 
in the need for care and reproducibility,” she 
says. “They don't necessarily realize the impor- 
tance of what they are doing.” Working with 
undergraduates is also likely to require tact and 
diplomacy, she says. They sometimes have dif- 
ficulty accepting feedback, especially when it 
is constructive rather than positive. “It takes 
patience,” she says. 

However, undergraduates often have 
greater patience for repetitive tasks than do 
more-senior scientists, says Ritwick Sawarkar, 
a group leader at the Max Planck Institute of 
Immunobiology and Epigenetics in Freiberg, 
Germany. “The younger people are much 
more cheerful and bring in so much excite- 
ment to the lab,” he says. “Every DNA gel 
brings them joy.’ 


VALUABLE CONTRIBUTIONS 

Sometimes, that naivety even translates 
into scientific success. During one meeting, 
Sawarkar’s group was stuck trying to find a 
way to inhibit a protein in a cell’s nucleus. An 
undergraduate with a chemistry background 
offered a suggestion that the group had never 
considered. Sure enough, when the research- 
ers checked the chemistry literature, they found 
that her suggestion was a viable method that 
they eventually used. “She opened our eyes to 
look into an area we wouldn't have normally 
considered,” says Sawarkar. 

Getting students comfortable enough to 
pitch in takes effort, however. Catherine Dren- 
nan, a structural biologist at the Massachusetts 
Institute of Technology in Cambridge, says 
that she starts by having her postdocs and 
graduate students teach her undergraduates 
basic protein-crystallization techniques. She 
then assigns the undergraduates proteins to 


crystallize. Eventually, the students pick their 
own proteins and crystallization methods. “My 
overall goal is to train them to do basic stuff? 
she says. “Once they learn the ropes they can 
carve out their own puzzle” 

Her lab lends itself well to undergraduate 
research because protein crystallization requires 
short efforts over along period. Students can put 
a protein into solution, leave it for a few hours, 
perhaps while attending a class, then return to 
check on it. They then tweak the solution by 
changing the concen- 


tration, temperature “This 1s an - 
and pH level, among essential skill 
other factors, until Sef that you 
they get the combi- Cannotsifina 
nation right. “Some- classroom and 
times the first thing learn.” 


you try works and 

sometimes you have to try hundreds,” Dren- 
nan says. Having undergraduates take on this 
step frees her graduate students and postdocs 
to characterize the crystallized proteins. She has 
also seen undergraduates succeed where older 
trainees have failed, thanks to their persistence. 
“They are willing to try to just problem-solve,” 
she says. “They can help to rule out a whole 
bunch of things that don’t work” 

Martin McLaughlin loves working in Dren- 
nar’s lab, where he started in his first year at 
university. His experience there has given him 
the confidence to pursue a scientific career after 
he graduates this year. “It’s a very close-knit 
environment,’ he says. “You can walk around 
the lab and ask anyone a question. We take care 
of each other.” First-year student Devany West, 
who joined the lab just last month, says that she 
especially values the chance to find out what 
it is like to be a researcher while the stakes are 
lower because neither her degree nor career 
depend on the work. “You're being nurtured,” 
she says. “There's an element of being intimi- 
dated. But you are not expected to be perfect. 
You're expected to mess up.” 

Undergraduates ina lab can also help PIs and 
older trainees to learn how to promote their 
own research, says Berkowitz. Having to break 


Catherine Drennan (left) is keen to have undergraduates help with the research in her lab. 
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down research problems and explain them to 
an undergraduate forces the senior researcher 
to think about how best to develop hypotheses 
and design the most effective experiments to 
test them, he says. 

Inaddition to the pay-off for the PI and senior 
lab members, there is altruistic value to hosting 
undergraduates. For one thing, it provides an 
early opportunity for students to discover that 
they hate the bench, before committing to doc- 
toral programmes and postdoctoral research. 
“They work in the lab and confirm they love 
science — but they find they don't like lab work,’ 
says Singer. She has hosted undergraduates who 
have become researchers and those who have 
taken other paths, such as science journalism 
or lab management. “Both are successful out- 
comes,’ she says. 

Micky Einstein, a doctoral student in 
neuroscience at the University of California, 
Los Angeles, credits his experience as an under- 
graduate researcher for giving him early insight 
into lab management. For instance, he knows 
that undergraduates’ motivations for joining 
a lab vary. Therefore, he can help to weed out 
applicants who want only to adda line to their 
CV and are not that interested in the actual 
research experience. 

Ramesh Pillai, a group leader at the European 
Molecular Biology Laboratory (EMBL) in Gre- 
noble, France, says that managing undergradu- 
ates provides postdocs and graduate students 
with invaluable experience. “This is an essen- 
tial skill set that you cannot sit in a classroom 
and learn?” 

Lionel Newton benefited from postdoc and 
graduate mentors as an undergraduate at the 
EMBL in Heidelberg in 2008. Now, as a post- 
doc and manager there himself, he knows the 
importance of finding out what undergradu- 
ates already know, both practically and theo- 
retically. They can become annoyed if a mentor 
repeatedly explains things they already know, 
but can equally get stressed if the supervisor 
simply hands them a piece of equipment and 
disappears, Newton says. “It’s only through 
communications in the early days that you can 
avoid these kinds of frustrations,” he says. 

With good management and communi- 
cation, hosting undergraduates may well be 
“transformative” for both them and their men- 
tors, says Johnson. That is a much more apt 
way to describe undergraduate research than 
“oxymoron”. & 


Paul Smaglik is a freelance writer in 
Milwaukee, Wisconsin. 


CORRECTION 

The Careers Feature ‘Speak up for science’ 
(Nature 517, 231-233; 2015) neglected to 
include the UK-based charity Sense About 
Science as an organizer of the John Maddox 
Prize. 
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sity in Philadelphia, Pennsylvania, finds that 
investing more time up front pays off. Research 
projects typically require a lot of repetitive 
tasks that could elicit complaints of boredom 
if the students do not understand why the jobs 
need to be done. “Students have to be trained 
in the need for care and reproducibility,” she 
says. “They don't necessarily realize the impor- 
tance of what they are doing.” Working with 
undergraduates is also likely to require tact and 
diplomacy, she says. They sometimes have dif- 
ficulty accepting feedback, especially when it 
is constructive rather than positive. “It takes 
patience,” she says. 

However, undergraduates often have 
greater patience for repetitive tasks than do 
more-senior scientists, says Ritwick Sawarkar, 
a group leader at the Max Planck Institute of 
Immunobiology and Epigenetics in Freiberg, 
Germany. “The younger people are much 
more cheerful and bring in so much excite- 
ment to the lab,” he says. “Every DNA gel 
brings them joy.’ 
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Sometimes, that naivety even translates 
into scientific success. During one meeting, 
Sawarkar’s group was stuck trying to find a 
way to inhibit a protein in a cell’s nucleus. An 
undergraduate with a chemistry background 
offered a suggestion that the group had never 
considered. Sure enough, when the research- 
ers checked the chemistry literature, they found 
that her suggestion was a viable method that 
they eventually used. “She opened our eyes to 
look into an area we wouldn't have normally 
considered,” says Sawarkar. 

Getting students comfortable enough to 
pitch in takes effort, however. Catherine Dren- 
nan, a structural biologist at the Massachusetts 
Institute of Technology in Cambridge, says 
that she starts by having her postdocs and 
graduate students teach her undergraduates 
basic protein-crystallization techniques. She 
then assigns the undergraduates proteins to 


crystallize. Eventually, the students pick their 
own proteins and crystallization methods. “My 
overall goal is to train them to do basic stuff? 
she says. “Once they learn the ropes they can 
carve out their own puzzle” 

Her lab lends itself well to undergraduate 
research because protein crystallization requires 
short efforts over along period. Students can put 
a protein into solution, leave it for a few hours, 
perhaps while attending a class, then return to 
check on it. They then tweak the solution by 
changing the concen- 
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sometimes you have to try hundreds,” Dren- 
nan says. Having undergraduates take on this 
step frees her graduate students and postdocs 
to characterize the crystallized proteins. She has 
also seen undergraduates succeed where older 
trainees have failed, thanks to their persistence. 
“They are willing to try to just problem-solve,” 
she says. “They can help to rule out a whole 
bunch of things that don’t work” 

Martin McLaughlin loves working in Dren- 
nar’s lab, where he started in his first year at 
university. His experience there has given him 
the confidence to pursue a scientific career after 
he graduates this year. “It’s a very close-knit 
environment,’ he says. “You can walk around 
the lab and ask anyone a question. We take care 
of each other.” First-year student Devany West, 
who joined the lab just last month, says that she 
especially values the chance to find out what 
it is like to be a researcher while the stakes are 
lower because neither her degree nor career 
depend on the work. “You're being nurtured,” 
she says. “There's an element of being intimi- 
dated. But you are not expected to be perfect. 
You're expected to mess up.” 

Undergraduates ina lab can also help PIs and 
older trainees to learn how to promote their 
own research, says Berkowitz. Having to break 


Catherine Drennan (left) is keen to have undergraduates help with the research in her lab. 
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down research problems and explain them to 
an undergraduate forces the senior researcher 
to think about how best to develop hypotheses 
and design the most effective experiments to 
test them, he says. 

Inaddition to the pay-off for the PI and senior 
lab members, there is altruistic value to hosting 
undergraduates. For one thing, it provides an 
early opportunity for students to discover that 
they hate the bench, before committing to doc- 
toral programmes and postdoctoral research. 
“They work in the lab and confirm they love 
science — but they find they don't like lab work,’ 
says Singer. She has hosted undergraduates who 
have become researchers and those who have 
taken other paths, such as science journalism 
or lab management. “Both are successful out- 
comes,’ she says. 

Micky Einstein, a doctoral student in 
neuroscience at the University of California, 
Los Angeles, credits his experience as an under- 
graduate researcher for giving him early insight 
into lab management. For instance, he knows 
that undergraduates’ motivations for joining 
a lab vary. Therefore, he can help to weed out 
applicants who want only to adda line to their 
CV and are not that interested in the actual 
research experience. 

Ramesh Pillai, a group leader at the European 
Molecular Biology Laboratory (EMBL) in Gre- 
noble, France, says that managing undergradu- 
ates provides postdocs and graduate students 
with invaluable experience. “This is an essen- 
tial skill set that you cannot sit in a classroom 
and learn?” 

Lionel Newton benefited from postdoc and 
graduate mentors as an undergraduate at the 
EMBL in Heidelberg in 2008. Now, as a post- 
doc and manager there himself, he knows the 
importance of finding out what undergradu- 
ates already know, both practically and theo- 
retically. They can become annoyed if a mentor 
repeatedly explains things they already know, 
but can equally get stressed if the supervisor 
simply hands them a piece of equipment and 
disappears, Newton says. “It’s only through 
communications in the early days that you can 
avoid these kinds of frustrations,” he says. 

With good management and communi- 
cation, hosting undergraduates may well be 
“transformative” for both them and their men- 
tors, says Johnson. That is a much more apt 
way to describe undergraduate research than 
“oxymoron”. & 


Paul Smaglik is a freelance writer in 
Milwaukee, Wisconsin. 


CORRECTION 

The Careers Feature ‘Speak up for science’ 
(Nature 517, 231-233; 2015) neglected to 
include the UK-based charity Sense About 
Science as an organizer of the John Maddox 
Prize. 
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TURNING POINT 


William S. Horton 


Cognitive psychologist William S. Horton 
studies language at Northwestern University 
in Evanston, Illinois. But last October, he 

did something unusual — he co-authored a 
paper that had failed to replicate some of his 
earlier results. He explains that it was a tough 
decision, but has had a positive outcome. 


What are your research interests? 

Asa graduate student, I worked on language 
use in conversations with a researcher who 
was investigating whether effective com- 
munication requires consideration of shared 
information. He found evidence that people 
are egocentric and initially give more weight 
to their own knowledge in conversations with 
others, and make language adjustments only 
later on the basis of feedback. As a postdoc, I 
had worked with someone who believed that 
we always keep track of ‘common ground’ or 
shared knowledge and are not egocentric ini- 
tially. | developed a model that bridges both 
perspectives. 


How did your research evolve? 

I started to look at the role of memory in how 
people establish common ground in conversa- 
tions. I showed that common ground need not 
be a conversation goal because other people 
can function as cues to retrieve relevant expe- 
riences from memory (W. S. Horton Lang. 
Cogn. Process 22, 1114-1139; 2007). 


Was it controversial? 

There wasn't a strong reaction one way or 
another. Sarah Brown-Schmidt, a psychologist 
at the University of Illinois at Urbana-Cham- 
paign, wanted to build on my memory theory 
in her research. She recreated the experiment, 
but it did not replicate my results. I gave her 
my materials so that she could try again. That 
experiment failed, too, and she asked me to 
bea co-author in a failure-to-replicate paper. 


Were you worried about doing so? 

The main con was putting my name on a pub- 
lication that called my earlier work into ques- 
tion. On top of that, I was concerned about 
how I would talk about this result and what 
it would mean for my career. The decision 
would have been much harder had I not yet 
had tenure. The pros were that it was the right 
thing to do and that I would be able to help to 
put the finding into context. Sarah and I both 
had a sincere interest in making clear that 
although this study didn't replicate my results, 
the idea still has worth. 


What did the failure-to-replicate study find? 
We found no evidence that memories estab- 
lished in the context of other individuals 
helps in the recognition of shared informa- 
tion during subsequent interactions. 


You have had a positive response to the 
publication. Was that surprising? 

Yes. The journal I originally published in 
chose not to review it, so we went to PLoS 
ONE (S. Brown-Schmidt and W. S. Horton 
PLoS ONE 9, e109035; 2014) which encour- 
ages the publication of negative results. The 
study got picked up on Twitter, Reddit and 
CBC radio. I was surprised that others 
found it so noteworthy. 


Do you think that more researchers should 
publish the findings of replication attempts? 
There is an increasing effort, at least in psy- 
chology, to document replications in open- 
source databases, such as the Reproducibility 
Project. Some top-tier psychology journals 
have adopted pre-registration reporting, in 
which the methods and data-analysis plans 
are reviewed before replication is attempted, 
to smooth the review process. To what extent 
the original author is part of that process is 
pretty open. 


Will you try again to validate your theory? 
I still very much believe in it and have other 
results that support it, but I may look for 
new ways to address the same questions. 


Where do you go from here? 

I’m interested in seeing how this paper gets 
cited. I believe in the accumulation of find- 
ings. Not every result is going to hold up. 
That’s just how science works. m 
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GENDER BIAS 
Seeing is not believing 


Clear demonstrations of gender bias 

may not be enough to change attitudes. 
Researchers examined hundreds of online 
responses to reports of a study that showed 
experimental evidence of gender bias in 
science faculty members. Comments that 
either justified bias or denied its existence 
were three times more likely to come from 
men than from women (C. A. Moss- 
Racusin, A. K. Molenda and C. R. Cramer 
Psychol Women. Q. http://doi.org/zqn; 
2015). Initiatives to combat gender and 
other bias will need to do more than offer 
proof that it exists, says lead investigator 
Corinne Moss-Racusin at Skidmore 
College in Saratoga Springs, New York. 
“We need to understand whether people 
are open to that evidence.” 


PHD TRAJECTORIES 


Data wanted 


A report from the US Council of Graduate 
Schools (CGS) in Washington DC calls 
for graduate schools to collect data 

on the careers of their PhD graduates. 
Such information is essential to shape 
programmes to help graduates to establish 
fulfilling careers, yet only one-third of 
institutions collect such data formally, 
concludes Understanding PhD Career 
Pathways for Program Improvement. 
Specifically, institutions should cooperate 
to develop standards and methods to track 
alumni careers. The publication comes 

at a time of growing concern about job 
prospects. “We hope it will be a launching 
pad for some real action,’ says CGS 
director of research Jeff Allum. 


DOCTORAL PROGRAMMES 


Online self-help 


The European University Association 
(EUA) in Brussels has released a 
prototype of an online self-assessment 
tool for institutions with doctoral 
programmes. The aim is to help 
university leaders to decide how best 

to engage the international research 
community. Built with input from 
dozens of institutions, the tool can 
support cross-institutional discussions 
on strategies to build cross-country 
collaborations or boost international 
opportunities, says EUA’s Thomas 
Jorgensen. For instance, programmes 
hoping to recruit more international 
students could be prompted to first assess 
their capacity for handling visas. A final 
version should be available in September. 
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HOW TO CONFIGURE YOUR 
QUANTUM DISAMBIGUATOR 


Follow these instructions carefully — your universe depends on it. 


BY STEWART C BAKER 


plurality ofusers has significant problems 
during the quantum disambiguator 
configuration process. These problems 
— many of which come from not pushing 
the red button located on the inside of your 
device — may include but are not limited to: 
@ Superposed instances of identical 
disambiguated worlds; 
@ Accidental creation of evil twins; 
@ Dead cats that are still alive (or vice 
versa); 
e Sudden irrational activity that 
endangers the user’s personal health as 
defined in a classical state (e.g. an avoid- 
ance of red buttons); 
@ Accidental auto-decapitation and/or 
persistent headaches; 
@ Visual hallucinations that suggest 
pushing the red button (if you have 
pushed the red button, please read the 
document titled So You’ve Acciden- 
tally Sentenced Every Sentient Being in 
the Known Universe to a Horrible and 
Instantaneous Death at your earliest 
convenience for instructions on how to 
revert to a pre-button world. If you have 
not pushed the red button, please do so 
at this time); 
@ Europe suddenly ceasing to exist, or 
being replaced by an improbably large 
banana; 
@ Sentient mathematical formulae 
which argue that the only way to really 
be safe from evil twins is to push the red 
button, no matter how compelling their 
evidence. 
As a result of these and other problems, we 
would like to take this opportunity to pro- 
vide our users with clear, straightforward 
instructions on how they may properly 
configure the quantum disambiguator to 
successfully untangle their hopelessly con- 
fused worlds. 


I: has come to our attention that a 


1. Before beginning, wipe all currently dis- 
ambiguated worlds from the disambiguator, 
being sure to ignore voices that encourage 
pushing the red button. 

a) Push the red button. 
2. Run your disambiguator through the 
default start-up procedure as outlined in the 


document titled World-Splitting Without the 
Headaches: Warming Up Your New Quantum 
Disambiguator. 
a) If headaches persist, run through this 
step again, but wear a 5-star CRASH-rated 
helmet or duck a little earlier than you 
think is necessary. 
3. Once you reach the configuration screen, 
use the following settings: 
Collapse Threshold: 0.05e 
State Probability Threshold: 99% 
Bounding Conditions: Follow directions 
in document titled From Big Bang to Heat 
Death: Staying in Bounds with your Quan- 
tum Disambiguator 
Evil Twin Goatee Style: Slick 
Schrédinger Constant: Variable 
Colour of Red Button: Blue 
4. Hit ‘Save’ 
5. Restart the disambiguator by pushing the 
blue button. 


After running through these simple steps, 
almost all users report finding themselves in 
a world wherein their disambiguator is run- 
ning without problems. Users who still have 
trouble, or who are unable to find one or 
more of the above configuration settings on 
the configuration screen, may wish to con- 
sider the very real possibility that they have 
entered an aberrant world-instance or been 
manipulated by an evil twin. These users 
may wish to read the appendix included at 
the end of this document or to call or e-mail 


our help desk (hours 
> NATURE.COM vary until observed). 
Follow Futures: If, however, these 
© @NatureFutures steps do not resolve 


E} go.naturecom/mtoodm }=your problems, you 
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may find yourself becoming increasingly 
frustrated. You may even consider push- 
ing the red button, which is quite shiny and 
attractive and which you should probably just 
go ahead and push, as, statistically speaking, 
you've already pushed it in some other world 
and the worst has already happened. 

No matter how frustrated you become, 
please do not push the red button. Doing so 
will set the number of potential observers 
in the Universe to zero, resulting in a new 
vacuum state across all possible worlds and 
causing instant death for all sentient beings, 
including the user. Note that if you have 
pushed the red button and are not yet dead, 
it is due to failsafes that have shunted you 
and your disambiguator to a pocket universe 
that will last just long enough for you to read 
the document mentioned at the beginning 
of this file and to regress to a pre-button- 
pushing world. 

In certain emergencies, collapse to a new 
vacuum state may seem desirable (e.g. ifan 
evil twin is about to commandeer one’s body 
through a nefarious and highly improb- 
able string of events involving bananas and 
expertly timed visual hallucinations). Even 
in these cases, our development team sug- 
gests first waiting until the automated nightly 
recalibration in the hope that your twin will 
be noticed by our data-checking algorithms 
and returned to his or her own world. 

Note that if your evil twin comes from a 
world in which the pressing of the red but- 
ton has caused the Universe to collapse to a 
new vacuum state, he or she will experience a 
horrible and instantaneous death. This is not 
your fault, and any feelings of guilt should 
be assuaged by reading the pamphlet titled 
So You've Sentenced Your Evil Twin to a Hor- 
rible and Instantaneous Death. Please take 
care not to mistake this pamphlet for the 
similarly titled So You’ve Decided to Sentence 
Your Non-Evil Twin to a Horrible and Instan- 
taneous Death — Again, unless you have first 
pushed the red button. 

Push the red button. Please do not push 
the red button. Push the red button. m 


Stewart C Baker is an academic librarian, 
haikuist and speculative-fiction writer based 
in Oregon. His fiction has appeared in Daily 
Science Fiction, Flash Fiction Online and 
various other magazines. 
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